[R] adding rows
Dear useRs, Here is my data with two columns and 20 rows. dput(TT) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, c(, SS))) I first of all want to sum up continuously two rows (1 2, 3 4, 5 6 and so on) of each column. Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th and 20th rows do not up 3 rows, so they should be ignored. Similarly with 4 sets of rows and 5 sets of rows and even 6. I hope I was clear. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Hello, Try the following. fun - function(x, r){ if(r 0){ m - length(x) %/% r y - numeric(m) for(i in seq_len(m)){ y[i] - sum(x[((i - 1)*r + 1):(i*r)]) } y }else{ NULL } } apply(TT, 2, fun, r = 2) apply(TT, 2, fun, r = 3) etc Hope this helps, Rui Barradas Em 25-09-2014 20:50, eliza botto escreveu: Dear useRs, Here is my data with two columns and 20 rows. dput(TT) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, c(, SS))) I first of all want to sum up continuously two rows (1 2, 3 4, 5 6 and so on) of each column. Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th and 20th rows do not up 3 rows, so they should be ignored. Similarly with 4 sets of rows and 5 sets of rows and even 6. I hope I was clear. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Another approach fun - function(i, dat=x) { grp - rep(1:(nrow(dat)/i), each=i) aggregate(dat[1:length(grp),]~grp, FUN=sum) } lapply(2:6, fun, dat=TT) - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas Sent: Thursday, September 25, 2014 3:34 PM To: eliza botto; r-help@r-project.org Subject: Re: [R] adding rows Hello, Try the following. fun - function(x, r){ if(r 0){ m - length(x) %/% r y - numeric(m) for(i in seq_len(m)){ y[i] - sum(x[((i - 1)*r + 1):(i*r)]) } y }else{ NULL } } apply(TT, 2, fun, r = 2) apply(TT, 2, fun, r = 3) etc Hope this helps, Rui Barradas Em 25-09-2014 20:50, eliza botto escreveu: Dear useRs, Here is my data with two columns and 20 rows. dput(TT) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, c(, SS))) I first of all want to sum up continuously two rows (1 2, 3 4, 5 6 and so on) of each column. Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th and 20th rows do not up 3 rows, so they should be ignored. Similarly with 4 sets of rows and 5 sets of rows and even 6. I hope I was clear. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
see inline for another vectorized example. On 25 September 2014 23:05, David L Carlson dcarl...@tamu.edu wrote: Another approach fun - function(i, dat=x) { grp - rep(1:(nrow(dat)/i), each=i) aggregate(dat[1:length(grp),]~grp, FUN=sum) } lapply(2:6, fun, dat=TT) - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas Sent: Thursday, September 25, 2014 3:34 PM To: eliza botto; r-help@r-project.org Subject: Re: [R] adding rows Hello, Try the following. fun - function(x, r){ if(r 0){ m - length(x) %/% r y - numeric(m) for(i in seq_len(m)){ y[i] - sum(x[((i - 1)*r + 1):(i*r)]) } y }else{ NULL } } fun - function(x,r) { i - length(x)%/%r tapply(x[1:(i*r)], gl(i,r), sum) } apply(TT, 2, fun, r = 2) apply(TT, 2, fun, r = 3) etc Hope this helps, Rui Barradas Em 25-09-2014 20:50, eliza botto escreveu: Dear useRs, Here is my data with two columns and 20 rows. dput(TT) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, c(, SS))) I first of all want to sum up continuously two rows (1 2, 3 4, 5 6 and so on) of each column. Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th and 20th rows do not up 3 rows, so they should be ignored. Similarly with 4 sets of rows and 5 sets of rows and even 6. I hope I was clear. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Inline. -- Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 Data is not information. Information is not knowledge. And knowledge is certainly not wisdom. Clifford Stoll On Thu, Sep 25, 2014 at 3:28 PM, Sven E. Templer sven.temp...@gmail.com wrote: see inline for another vectorized example. Nope. x-apply family functions are disguised (Interpreter level, not C level) loops. Rarely more efficient than for() loops, but often clearer and more convenient. Cheers, Bert On 25 September 2014 23:05, David L Carlson dcarl...@tamu.edu wrote: Another approach fun - function(i, dat=x) { grp - rep(1:(nrow(dat)/i), each=i) aggregate(dat[1:length(grp),]~grp, FUN=sum) } lapply(2:6, fun, dat=TT) - David L Carlson Department of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Rui Barradas Sent: Thursday, September 25, 2014 3:34 PM To: eliza botto; r-help@r-project.org Subject: Re: [R] adding rows Hello, Try the following. fun - function(x, r){ if(r 0){ m - length(x) %/% r y - numeric(m) for(i in seq_len(m)){ y[i] - sum(x[((i - 1)*r + 1):(i*r)]) } y }else{ NULL } } fun - function(x,r) { i - length(x)%/%r tapply(x[1:(i*r)], gl(i,r), sum) } apply(TT, 2, fun, r = 2) apply(TT, 2, fun, r = 3) etc Hope this helps, Rui Barradas Em 25-09-2014 20:50, eliza botto escreveu: Dear useRs, Here is my data with two columns and 20 rows. dput(TT) structure(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 24, 48, 72, 96, 120, 144, 168, 192, 216, 240, 264, 288, 312, 336, 360, 384, 408, 432, 456, 480), .Dim = c(20L, 2L), .Dimnames = list(NULL, c(, SS))) I first of all want to sum up continuously two rows (1 2, 3 4, 5 6 and so on) of each column. Then I want to sum up 3 rows as (1-2-3,4-5-6,. 16-17-18) and since 19th and 20th rows do not up 3 rows, so they should be ignored. Similarly with 4 sets of rows and 5 sets of rows and even 6. I hope I was clear. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
HI, If you want to try other ways: fun1 - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 indx - rep(1:rowN, dm - rowN1) + rep(seq(0, dm - rowN), each = rowN) indx1 - (seq_along(indx)-1)%/%rowN+1 as.vector(tapply(indx, list(indx1), FUN = function(i) sum(mat[i, ]))) } #or fun2 - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 indx - rep(1:rowN, dm - rowN1) + rep(seq(0, dm - rowN), each = rowN) mat1 - mat[indx, ] indx1 - rep((seq_along(indx) - 1)%/%rowN + 1, ncol(mat)) colSums(matrix(mat1[sort.int(indx1, method = quick, index.return = TRUE)$ix], ncol = dm - rowN1)) } ##But the above methods are slower in large datasets. ##Rui's function funOld - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 sapply(1:(dm - rowN1), function(i) sum(mat[i:(i + rowN1), ])) } ###Modified Rui's function using ?for() loop instead of ?sapply() funNew - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 vec - vector(mode = numeric, length = dm - rowN1) for (i in 1:(dm - rowN1)) { vec[i] - sum(mat[i:(i + rowN1), ]) } vec } ##Speed comparison set.seed(348) el1 - matrix(sample(1:300, 1e6*50,replace=TRUE),ncol=50) system.time(r1 - fun1(el1,3)) # user system elapsed # 15.888 0.120 16.042 system.time(r2 - funOld(el1,3)) # user system elapsed # 5.061 0.004 5.076 system.time(r3 - fun2(el1,3)) # user system elapsed #12.371 1.329 13.735 system.time(r4 - funNew(el1,3)) # user system elapsed # 4.716 0.000 4.727 library(compiler) funOldc - cmpfun(funOld) funNewc - cmpfun(funNew) fun2c - cmpfun(fun2) system.time(r5 - funOldc(el1,3)) # user system elapsed # 7.458 0.000 7.476 system.time(r6 - funNewc(el1,3)) # user system elapsed # 3.529 0.000 3.536 sapply(paste0(r,2:6),function(x) all.equal(r1,get(x) )) # r2 r3 r4 r5 r6 #TRUE TRUE TRUE TRUE TRUE A.K. On Friday, May 9, 2014 5:56 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Try the following. sapply(1:(30 - 2), function(i) sum(el[i:(i+2), ])) but with number of rows instead of 30. Hope this helps, Rui Barradas Em 09-05-2014 22:35, eliza botto escreveu: Dear useRs, I have a matrix, say el of 30 rows and 10 columns, as el-matrix(sample(1:300),ncol=10) I want to sum up various sets of three rows of each column in the following manner sum(el[c(1,2,3),]) ##adding row number 1, 2 and 3 of each column sum(el[c(2,3,4),])##adding row number 2, 3 and 4 of each column sum(el[c(3,4,5),])##adding row number 3, 4 and 5 of each column sum(el[c(4,5,6),]) sum(el[c(5,6,7),]) sum(el[c(6,7,8),]) sum(el[c(7,8,9),]) sum(el[c(8,9,10),]) sum(el[c(9,10,11),]) .. so on .. I know how to do it manually, but since my original matrix has 2000 rows, I therefore want to figure out a more conveinient way. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Thankyou very much arun. Its always nice to hear from you. Eliza Date: Sat, 10 May 2014 03:55:29 -0700 From: smartpink...@yahoo.com Subject: Re: [R] adding rows To: r-help@r-project.org CC: ruipbarra...@sapo.pt; eliza_bo...@hotmail.com HI, If you want to try other ways: fun1 - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 indx - rep(1:rowN, dm - rowN1) + rep(seq(0, dm - rowN), each = rowN) indx1 - (seq_along(indx)-1)%/%rowN+1 as.vector(tapply(indx, list(indx1), FUN = function(i) sum(mat[i, ]))) } #or fun2 - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 indx - rep(1:rowN, dm - rowN1) + rep(seq(0, dm - rowN), each = rowN) mat1 - mat[indx, ] indx1 - rep((seq_along(indx) - 1)%/%rowN + 1, ncol(mat)) colSums(matrix(mat1[sort.int(indx1, method = quick, index.return = TRUE)$ix], ncol = dm - rowN1)) } ##But the above methods are slower in large datasets. ##Rui's function funOld - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 sapply(1:(dm - rowN1), function(i) sum(mat[i:(i + rowN1), ])) } ###Modified Rui's function using ?for() loop instead of ?sapply() funNew - function(mat, rowN) { dm - dim(mat)[1] rowN1 - rowN - 1 vec - vector(mode = numeric, length = dm - rowN1) for (i in 1:(dm - rowN1)) { vec[i] - sum(mat[i:(i + rowN1), ]) } vec } ##Speed comparison set.seed(348) el1 - matrix(sample(1:300, 1e6*50,replace=TRUE),ncol=50) system.time(r1 - fun1(el1,3)) # user system elapsed # 15.888 0.120 16.042 system.time(r2 - funOld(el1,3)) # user system elapsed # 5.061 0.004 5.076 system.time(r3 - fun2(el1,3)) # user system elapsed #12.371 1.329 13.735 system.time(r4 - funNew(el1,3)) # user system elapsed # 4.716 0.000 4.727 library(compiler) funOldc - cmpfun(funOld) funNewc - cmpfun(funNew) fun2c - cmpfun(fun2) system.time(r5 - funOldc(el1,3)) # user system elapsed # 7.458 0.000 7.476 system.time(r6 - funNewc(el1,3)) # user system elapsed # 3.529 0.000 3.536 sapply(paste0(r,2:6),function(x) all.equal(r1,get(x) )) # r2 r3 r4 r5 r6 #TRUE TRUE TRUE TRUE TRUE A.K. On Friday, May 9, 2014 5:56 PM, Rui Barradas ruipbarra...@sapo.pt wrote: Hello, Try the following. sapply(1:(30 - 2), function(i) sum(el[i:(i+2), ])) but with number of rows instead of 30. Hope this helps, Rui Barradas Em 09-05-2014 22:35, eliza botto escreveu: Dear useRs, I have a matrix, say el of 30 rows and 10 columns, as el-matrix(sample(1:300),ncol=10) I want to sum up various sets of three rows of each column in the following manner sum(el[c(1,2,3),]) ##adding row number 1, 2 and 3 of each column sum(el[c(2,3,4),])##adding row number 2, 3 and 4 of each column sum(el[c(3,4,5),])##adding row number 3, 4 and 5 of each column sum(el[c(4,5,6),]) sum(el[c(5,6,7),]) sum(el[c(6,7,8),]) sum(el[c(7,8,9),]) sum(el[c(8,9,10),]) sum(el[c(9,10,11),]) .. so on .. I know how to do it manually, but since my original matrix has 2000 rows, I therefore want to figure out a more conveinient way. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding rows
Dear useRs, I have a matrix, say el of 30 rows and 10 columns, as el-matrix(sample(1:300),ncol=10) I want to sum up various sets of three rows of each column in the following manner sum(el[c(1,2,3),]) ##adding row number 1, 2 and 3 of each column sum(el[c(2,3,4),])##adding row number 2, 3 and 4 of each column sum(el[c(3,4,5),])##adding row number 3, 4 and 5 of each column sum(el[c(4,5,6),]) sum(el[c(5,6,7),]) sum(el[c(6,7,8),]) sum(el[c(7,8,9),]) sum(el[c(8,9,10),]) sum(el[c(9,10,11),]) .. so on .. I know how to do it manually, but since my original matrix has 2000 rows, I therefore want to figure out a more conveinient way. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Hello, Try the following. sapply(1:(30 - 2), function(i) sum(el[i:(i+2), ])) but with number of rows instead of 30. Hope this helps, Rui Barradas Em 09-05-2014 22:35, eliza botto escreveu: Dear useRs, I have a matrix, say el of 30 rows and 10 columns, as el-matrix(sample(1:300),ncol=10) I want to sum up various sets of three rows of each column in the following manner sum(el[c(1,2,3),]) ##adding row number 1, 2 and 3 of each column sum(el[c(2,3,4),])##adding row number 2, 3 and 4 of each column sum(el[c(3,4,5),])##adding row number 3, 4 and 5 of each column sum(el[c(4,5,6),]) sum(el[c(5,6,7),]) sum(el[c(6,7,8),]) sum(el[c(7,8,9),]) sum(el[c(8,9,10),]) sum(el[c(9,10,11),]) .. so on .. I know how to do it manually, but since my original matrix has 2000 rows, I therefore want to figure out a more conveinient way. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows
Dear Rui and Murphy, Thanks for your help. Eliza Date: Fri, 9 May 2014 22:55:27 +0100 From: ruipbarra...@sapo.pt To: eliza_bo...@hotmail.com; r-help@r-project.org Subject: Re: [R] adding rows Hello, Try the following. sapply(1:(30 - 2), function(i) sum(el[i:(i+2), ])) but with number of rows instead of 30. Hope this helps, Rui Barradas Em 09-05-2014 22:35, eliza botto escreveu: Dear useRs, I have a matrix, say el of 30 rows and 10 columns, as el-matrix(sample(1:300),ncol=10) I want to sum up various sets of three rows of each column in the following manner sum(el[c(1,2,3),]) ##adding row number 1, 2 and 3 of each column sum(el[c(2,3,4),])##adding row number 2, 3 and 4 of each column sum(el[c(3,4,5),])##adding row number 3, 4 and 5 of each column sum(el[c(4,5,6),]) sum(el[c(5,6,7),]) sum(el[c(6,7,8),]) sum(el[c(7,8,9),]) sum(el[c(8,9,10),]) sum(el[c(9,10,11),]) .. so on .. I know how to do it manually, but since my original matrix has 2000 rows, I therefore want to figure out a more conveinient way. Thankyou so very much in advance, Eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows...
Hi Rainer: Thanks for the reply. Posting the large dataset is a task. There are 8M rows between the two of them and the first discrepancy in the data doesn't happen until at least the 40,000th row on each dataframe. The examples I posted are a pretty good abstraction of the root of the issue. The problem isn't the data. The problem is Out Of Memory issues when doing any operations like merge, rbind, etc. The solution that Blaser suggested in his post works great, but the systems quickly run out of memory. What does work without OOM issues are for/while loops but on average take an inordinate time to compute and tie up a machine for hours and hours at time. Essentially I break the data apart, add rows and rebind. It's a brute force type of approach and run times are in excess of 48 hours for one full iteration across 25 data frames. Terrible. I am about to go down the road of using data.tables class as its far more memory efficient, but the documentation is cryptic. Your idea of creating a super set has some merit and it's what I was experimenting with prior to my original post. -Original Message- From: Rainer Schuermann [mailto:rainer.schuerm...@gmx.net] Sent: Thursday, May 23, 2013 12:19 AM To: Adeel Amin Subject: adding rows... Can I suggest that you post the output of dput( DF1 ) dput( DF2 ) rather than pictures of your data? Any solution attempt will depend upon the data types... Just shooting in the dark: Have you tried just row-binding the missing 4k lines to DF1 and then order DF1 as you like? It looks as if the data are ordered by time / date? Rgds, Rainer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
Using the data generated with your code below, does rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] ) DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ] do the job? Rgds, Rainer On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote: Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows without loops
This is the exact solution I came up with ... exact, really? Is the time-consuming part the initial merge DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) or the postprocessing to turn runs of NAs into the last non-NA value in the column while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } If it is the latter, you may get better results from applying zoo::na.locf() to each non-key column of DFm. E.g., library(zoo) f2 - function(DFm) { for(i in 3:length(DFm)) { DFm[[i]] - na.locf(DFm[[i]]) } DFm } f(DFm) gives the same result as Blaser's algorithm f1 - function (DFm) { while (any(is.na(DFm))) { if (any(is.na(DFm[1, ]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind = TRUE) prind - matrix(c(ind[, row] - 1, ind[, col]), ncol = 2) DFm[is.na(DFm)] - DFm[prind] } DFm } If there are not a huge number of columns I would guess that f2() would be much faster. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel - SafeGreenCapital Sent: Thursday, May 23, 2013 5:54 AM To: 'Blaser Nello'; r-help@r-project.org Subject: Re: [R] adding rows without loops Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time
Re: [R] adding rows without loops
Rainer...I can't believe this did the trick. You're a genius. Thank you sir. On Thu, May 23, 2013 at 7:07 AM, Rainer Schuermann rainer.schuerm...@gmx.net wrote: Using the data generated with your code below, does rbind( DF1, DF2[ !(DF2$X.TIME %in% DF1$X.TIME), ] ) DF1 - DF1[ order( DF1$X.DATE, DF1$X.TIME ), ] do the job? Rgds, Rainer On Thursday 23 May 2013 05:54:26 Adeel - SafeGreenCapital wrote: Thank you Blaser: This is the exact solution I came up with but when comparing 8M rows even on an 8G machine, one runs out of memory. To run this effectively, I have to break the DF into smaller DFs, loop through them and then do a massive rmerge at the end. That's what takes 8+ hours to compute. Even the bigmemory package is causing OOM issues. -Original Message- From: Blaser Nello [mailto:nbla...@ispm.unibe.ch] Sent: Thursday, May 23, 2013 12:15 AM To: Adeel Amin; r-help@r-project.org Subject: RE: [R] adding rows without loops Merge should do the trick. How to best use it will depend on what you want to do with the data after. The following is an example of what you could do. This will perform best, if the rows are missing at random and do not cluster. DF1 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:5,7:9)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DF2 - data.frame(X.DATE=rep(01052007, 7), X.TIME=c(2:8)*100, VALUE=c(37, 42, 45, 45, 45, 42, 45), VALE2=c(29,24,28,27,35,32,32)) DFm - merge(DF1, DF2, by=c(X.DATE, X.TIME), all=TRUE) while(any(is.na(DFm))){ if (any(is.na(DFm[1,]))) stop(Complete first row required!) ind - which(is.na(DFm), arr.ind=TRUE) prind - matrix(c(ind[,row]-1, ind[,col]), ncol=2) DFm[is.na(DFm)] - DFm[prind] } DFm Best, Nello -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Adeel Amin Sent: Donnerstag, 23. Mai 2013 07:01 To: r-help@r-project.org Subject: [R] adding rows without loops I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 ... ... ... n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 ... ... n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative
[R] adding rows without loops
I'm comparing a variety of datasets with over 4M rows. I've solved this problem 5 different ways using a for/while loop but the processing time is murder (over 8 hours doing this row by row per data set). As such I'm trying to find whether this solution is possible without a loop or one in which the processing time is much faster. Each dataset is a time series as such: DF1: X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 070045 35 6 01052007 080042 32 7 01052007 090045 32 . . . n DF2 X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 35 6 01052007 070042 32 7 01052007 080045 32 . . n+4000 In other words there are 4000 more rows in DF2 then DF1 thus the datasets are of unequal length. I'm trying to ensure that all dataframes have the same number of X.DATE and X.TIME entries. Where they are missing, I'd like to insert a new row. In the above example, when comparing DF2 to DF1, entry 01052007 0600 entry is missing in DF1. The solution would add a row to DF1 at the appropriate index. so new dataframe would be X.DATE X.TIME VALUE VALUE2 1 01052007 020037 29 2 01052007 030042 24 3 01052007 040045 28 4 01052007 050045 27 5 01052007 060045 27 6 01052007 070045 35 7 01052007 080042 32 8 01052007 090045 32 Value and Value2 would be the same as row 4. Of course this is simple to accomplish using a row by row analysis but with of 4M rows the processing time destroying and rebinding the datasets is very time consuming and I believe highly un-R'ish. What am I missing? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows to a table with a loop
Thanks for the response, and the advice, glmulti looks like it could be quite a good alternative. As for the adding to the results table problem from within the loop, this webpage: http://ryouready.wordpress.com/2009/01/23/r-combining-vectors-or-data-frames-of-unequal-length-into-one-data-frame/ answered a number of my questions. -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-a-table-with-a-loop-tp3933634p3940293.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding rows to a table with a loop
Hi All, Its a bit of a beginners question I'm afraid. I have a looped stepwise regression (using MASS and StepAIC) to take random predictors out of the total number. For this example a random sample of 5 out of a total of 20. The loop will continue until all combinations of variables have been run through the loop. The output from each loop can be derived from taking the significant (p) coefficients from the summary command, and coding them '1' (if 0.05) or '0' (if 0.05), and producing a table of 5 columns for each of the predictors entered. The table produced in the loop is much smaller than the input table. Is there a way to produce a results table using the original column titles of the input table which can be matched to the subset predictors table, where if the variable was not in the subset its row value is 'NA'? I hope that makes sense. Cheers In advance, Matt -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-a-table-with-a-loop-tp3933634p3933634.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows to a table with a loop
It surely can be done. One way is to keep track of selected variables in a set. If a new variable is selected, you expand the selected set and set the frequency to be one, otherwise just increase the freqency of the selected variable (if... else). Also, you might want to have a look at glmulti package which conducts model selection out of all potential combinations, even including interactions if you don't have many variables. HTH Weidong Gu On Mon, Oct 24, 2011 at 11:58 AM, MJS taranis.1...@gmail.com wrote: Hi All, Its a bit of a beginners question I'm afraid. I have a looped stepwise regression (using MASS and StepAIC) to take random predictors out of the total number. For this example a random sample of 5 out of a total of 20. The loop will continue until all combinations of variables have been run through the loop. The output from each loop can be derived from taking the significant (p) coefficients from the summary command, and coding them '1' (if 0.05) or '0' (if 0.05), and producing a table of 5 columns for each of the predictors entered. The table produced in the loop is much smaller than the input table. Is there a way to produce a results table using the original column titles of the input table which can be matched to the subset predictors table, where if the variable was not in the subset its row value is 'NA'? I hope that makes sense. Cheers In advance, Matt -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-a-table-with-a-loop-tp3933634p3933634.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding rows based on column value
Dear all, I have one problem and did not find any solution. I have attached the question in text file also because sometimes spacing is not good in mail. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr PosCaseA CaseC CaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College LondonDear all, I have one problem and did not find any solution. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr PosCaseA CaseC CaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10
Re: [R] Adding rows based on column value
Hi: This seems to work: library(plyr) # select the variables to summarize: vars - paste('Case', c('A', 'C', 'G', 'T'), sep = '') # Alternatively, # vars - names(df)[grep('Case', names(df))] # One way: the ddply() function in package plyr in # conjunction with the colwise() function ddply(df, .(Pos), colwise(sum, vars)) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 The colwise() function applies the same function (here, sum) to each variable in the variable list given by vars. The wrapper function ddply() applies the colwise() function to each subset of the data defined by a unique value of Pos. Another way is to use the aggregate() function from base R. The following code comes from another thread on this list in the past couple of days due to Bill Dunlap. aggregate(df[vars], by = df['Pos'], FUN = sum) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 HTH, Dennis 2011/7/15 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution. I have attached the question in text file also because sometimes spacing is not good in mail. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 48.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 24.00 10 135344122 24.00 24.00 0.00 0.00 10 135344123 0.00 48.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
I have tried the aggregate command but it shows this error- vars - paste('Case', c('A', 'C', 'G', 'T'), sep = '') vars [1] CaseA CaseC CaseG CaseT aggregate(file[vars], by = df['Pos'], FUN = sum) Error in aggregate.data.frame(file[vars], by = df[Pos], FUN = sum) : arguments must have same length the thing is I cant use the plyr because I want the coding so that I can use it to make a tool. Can you please tell me why aggregate function is showing this error.I am confused. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Dennis Murphy [djmu...@gmail.com] Sent: Friday, July 15, 2011 7:38 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Adding rows based on column value Hi: This seems to work: library(plyr) # select the variables to summarize: vars - paste('Case', c('A', 'C', 'G', 'T'), sep = '') # Alternatively, # vars - names(df)[grep('Case', names(df))] # One way: the ddply() function in package plyr in # conjunction with the colwise() function ddply(df, .(Pos), colwise(sum, vars)) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 The colwise() function applies the same function (here, sum) to each variable in the variable list given by vars. The wrapper function ddply() applies the colwise() function to each subset of the data defined by a unique value of Pos. Another way is to use the aggregate() function from base R. The following code comes from another thread on this list in the past couple of days due to Bill Dunlap. aggregate(df[vars], by = df['Pos'], FUN = sum) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 HTH, Dennis 2011/7/15 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution. I have attached the question in text file also because sometimes spacing is not good in mail. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr PosCaseA CaseC CaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
Re: [R] Adding rows based on column value
I have tried the aggregate command but it shows this error- vars - paste('Case', c('A', 'C', 'G', 'T'), sep = '') vars [1] CaseA CaseC CaseG CaseT aggregate(file[vars], by = file['Pos'], FUN = sum) Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument the thing is I cant use the plyr because I want the coding so that I can use it to make a tool. Can you please tell me why aggregate function is showing this error.I am confused. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Dennis Murphy [djmu...@gmail.com] Sent: Friday, July 15, 2011 7:38 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Adding rows based on column value Hi: This seems to work: library(plyr) # select the variables to summarize: vars - paste('Case', c('A', 'C', 'G', 'T'), sep = '') # Alternatively, # vars - names(df)[grep('Case', names(df))] # One way: the ddply() function in package plyr in # conjunction with the colwise() function ddply(df, .(Pos), colwise(sum, vars)) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 The colwise() function applies the same function (here, sum) to each variable in the variable list given by vars. The wrapper function ddply() applies the colwise() function to each subset of the data defined by a unique value of Pos. Another way is to use the aggregate() function from base R. The following code comes from another thread on this list in the past couple of days due to Bill Dunlap. aggregate(df[vars], by = df['Pos'], FUN = sum) Pos CaseA CaseC CaseG CaseT 1 135344110 02448 0 2 135344113 0 024 0 3 13534411448 0 0 0 4 135344116 0 0 024 5 135344118 024 024 6 1353441222424 0 0 7 135344123 048 024 8 135344126 0 024 0 HTH, Dennis 2011/7/15 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution. I have attached the question in text file also because sometimes spacing is not good in mail. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr PosCaseA CaseC CaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R
[R] Adding rows based on column value
Dear all, I have one problem and did not find any solution.(I have also attached the problem in text file because sometimes column spacing is not good in mail) I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.for these same positions i want to add the values of columns 2:6 I will explain with an example- The output of first row should be- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 so the whole output for above input should be- Chr PosCaseA CaseCCaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Dear all, I have one problem and did not find any solution. I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.For these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 because first three rows have same value in Pos column. so the whole output for above input should be- Chr PosCaseA CaseC CaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00
Re: [R] Adding rows based on column value
?tapply (in base R) ?aggregate ?by (wrapper for tapply) ?ave (in base R -- based on tapply) Also package plyr (and several others, undoubtedly). Also google on R summarize data by groups or similar gets many relevant hits. -- Bert 2011/7/14 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution.(I have also attached the problem in text file because sometimes column spacing is not good in mail) I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.for these same positions i want to add the values of columns 2:6 I will explain with an example- The output of first row should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 so the whole output for above input should be- Chr Pos CaseA CaseC CaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 48.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 24.00 10 135344122 24.00 24.00 0.00 0.00 10 135344123 0.00 48.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
I have checked it but did not get any results.Is there a way I can do it?I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Bert Gunter [gunter.ber...@gene.com] Sent: Thursday, July 14, 2011 4:54 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Adding rows based on column value ?tapply (in base R) ?aggregate ?by (wrapper for tapply) ?ave (in base R -- based on tapply) Also package plyr (and several others, undoubtedly). Also google on R summarize data by groups or similar gets many relevant hits. -- Bert 2011/7/14 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution.(I have also attached the problem in text file because sometimes column spacing is not good in mail) I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.for these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 so the whole output for above input should be- Chr PosCaseA CaseCCaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Bansal, Vikas Sent: Thursday, July 14, 2011 6:07 PM To: Bert Gunter Subject: RE: [R] Adding rows based on column value Yes sir.I am trying. I am using this- aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) but I think this is not a right way.Because we cannot use sum to add.That is why I was asking for help. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Bert Gunter [gunter.ber...@gene.com] Sent: Thursday, July 14, 2011 6:01 PM To: Bansal, Vikas Subject: Re: [R] Adding rows based on column value Not from me -- I don't believe you've made an honest effort. Maybe someone else will help you. You might try posting reproducible code that show your efforts -- as the posting guide requests. -- Bert On Thu, Jul 14, 2011 at 9:46 AM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: I have checked it but did not get any results.Is there a way I can do it?I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: Bert Gunter [gunter.ber...@gene.com] Sent: Thursday, July 14, 2011 4:54 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] Adding rows based on column value ?tapply (in base R) ?aggregate ?by (wrapper for tapply) ?ave (in base R -- based on tapply) Also package plyr (and several others, undoubtedly). Also google on R summarize data by groups or similar gets many relevant hits. -- Bert 2011/7/14 Bansal, Vikas vikas.ban...@kcl.ac.uk: Dear all, I have one problem and did not find any solution.(I have also attached the problem in text file because sometimes column spacing is not good in mail) I have a file(file.txt) attached with this mail.I am reading it using this code to make a data frame (file)- file=read.table(file.txt,fill=T,colClasses = character,header=T) file looks like this- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 0.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344110 0.00 0.00 24.00 0.00 10 135344113 0.00 0.00 24.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344114 24.00 0.00 0.00 0.00 10 135344116 0.00 0.00 0.00 24.00 10 135344118 0.00 24.00 0.00 0.00 10 135344118 0.00 0.00 0.00 24.00 10 135344122 24.00 0.00 0.00 0.00 10 135344122 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 24.00 0.00 0.00 10 135344123 0.00 0.00 0.00 24.00 10 135344126 0.00 0.00 24.00 0.00 Now some of the values in column Pos are same.for these same positions i want to add the values of columns 3:6 I will explain with an example- The output of first row should be- Chr PosCaseA CaseCCaseG CaseT 10 135344110 0.00 24.00 48.00 0.00 so the whole output for above input should be- Chr PosCaseA CaseCCaseG CaseT 10 1353441100.00 24.00 48.000.00 10 1353441130.00 0.00 24.000.00 10 135344114 48.00 0.000.00 0.00 10 135344116 0.00 0.000.0024.00 10 135344118 0.00 24.00 0.0024.00 10 135344122 24.00 24.00 0.000.00 10 135344123 0.00 48.00 0.0024.00 10 135344126 0.00 0.0024.00 0.00 Can you please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions
Re: [R] Adding rows based on column value
Bansal, Vikas vikas.bansal at kcl.ac.uk writes: I am using this- aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) Better, although still not reproducible (please *do* read the posting guide -- it is listed at the bottom of every R list post and is the *first* google hit for posting guide (!); search for Examples). What about removing the quotation marks around sum? aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) but I think this is not a right way. Because we cannot use sum to add.That is why I was asking for help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
I have tried that also.But it is showing this error- aggregate(file[,3:6], by = list(file[,2]), FUN = sum) Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Ben Bolker [bbol...@gmail.com] Sent: Thursday, July 14, 2011 6:24 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] Adding rows based on column value Bansal, Vikas vikas.bansal at kcl.ac.uk writes: I am using this- aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) Better, although still not reproducible (please *do* read the posting guide -- it is listed at the bottom of every R list post and is the *first* google hit for posting guide (!); search for Examples). What about removing the quotation marks around sum? aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) but I think this is not a right way. Because we cannot use sum to add.That is why I was asking for help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
On 07/14/2011 01:46 PM, Bansal, Vikas wrote: I have tried that also.But it is showing this error- aggregate(file[,3:6], by = list(file[,2]), FUN = sum) Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument Farther down in your previous e-mail you state that you read the file in using file=read.table(file.txt,fill=T,colClasses = character,header=T) the 'colClasses' argument is telling R to read in the data as type character, which of course it is having trouble summing (as the error message suggests: R's error messages are often cryptic, but in this case it seems to be telling you exactly what's wrong). (You probably put it in there so that R wouldn't mess up your second column, but it was overkill. It converted *all* the columns to character.) Try changing your read statement to: file=read.table(file.txt,fill=TRUE, colClasses = rep(c(character,numeric),c(2,4)),header=TRUE) (changing T to TRUE is safer; the different colClasses is the important part. fill=TRUE is probably unnecessary.) If you're unsure what this is doing, please do your best to read ?read.table and ?rep, and try out examples, before responding with further queries ... Ben Bolker Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Ben Bolker [bbol...@gmail.com] Sent: Thursday, July 14, 2011 6:24 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] Adding rows based on column value Bansal, Vikas vikas.bansal at kcl.ac.uk writes: I am using this- aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) Better, although still not reproducible (please *do* read the posting guide -- it is listed at the bottom of every R list post and is the *first* google hit for posting guide (!); search for Examples). What about removing the quotation marks around sum? aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) but I think this is not a right way. Because we cannot use sum to add.That is why I was asking for help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows based on column value
Yes, because from your previous posts, you appeared to have read in the data as character: file=read.table(file.txt,fill=T,colClasses = character,header=T) But, of course, without a reproducible example, one cannot be sure. -- Bert On Thu, Jul 14, 2011 at 10:46 AM, Bansal, Vikas vikas.ban...@kcl.ac.uk wrote: I have tried that also.But it is showing this error- aggregate(file[,3:6], by = list(file[,2]), FUN = sum) Error in FUN(X[[1L]], ...) : invalid 'type' (character) of argument Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf Of Ben Bolker [bbol...@gmail.com] Sent: Thursday, July 14, 2011 6:24 PM To: r-h...@stat.math.ethz.ch Subject: Re: [R] Adding rows based on column value Bansal, Vikas vikas.bansal at kcl.ac.uk writes: I am using this- aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) Better, although still not reproducible (please *do* read the posting guide -- it is listed at the bottom of every R list post and is the *first* google hit for posting guide (!); search for Examples). What about removing the quotation marks around sum? aggregate(x = file[,3:6], by = list(file[,2]), FUN = sum) but I think this is not a right way. Because we cannot use sum to add.That is why I was asking for help. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding rows to column
I'm new to R. I'm extracting important columns from single table using following code: File2-file.txt table2- read.delim(File2, skip=19, sep=;, header=F, na.strings=NA, fill=T) #extracting column 7 where rows match ID col1- table2[grep(ID, table2[,1]),7] #similarly extracting column 9,11,13,15 col2- table2[grep(ID, table2[,1]),9] col3- table2[grep(ID, table2[,1]),11] col4- table2[grep(ID, table2[,1]),13] col5- table2[grep(ID, table2[,1]),15] there are also some other single columns I extracted from other file. Now I want to combine all these single columns into a single table with corresponding headers. Any hint on how that can be done? Thanks. i.e file3.txt col1 col2 col3 col4 col5 Regards, Anand Now how can I combine -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-column-tp3005607p3005607.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows to column
Hi! Would df- table2[grep(ID,table2[,1]), c(7,9,11,13,15)] do what you expect? Ivan Le 10/21/2010 15:42, amb1networks a écrit : I'm new to R. I'm extracting important columns from single table using following code: File2-file.txt table2- read.delim(File2, skip=19, sep=;, header=F, na.strings=NA, fill=T) #extracting column 7 where rows match ID col1- table2[grep(ID, table2[,1]),7] #similarly extracting column 9,11,13,15 col2- table2[grep(ID, table2[,1]),9] col3- table2[grep(ID, table2[,1]),11] col4- table2[grep(ID, table2[,1]),13] col5- table2[grep(ID, table2[,1]),15] there are also some other single columns I extracted from other file. Now I want to combine all these single columns into a single table with corresponding headers. Any hint on how that can be done? Thanks. i.e file3.txt col1 col2 col3 col4 col5 Regards, Anand Now how can I combine -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calan...@uni-hamburg.de ** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows to column
sorry i got clik happy df2-df1[, c(3,5,7,9,11,13,15)] df2-df2[grep('ID', df2$Group), ] -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-column-tp3005607p3006302.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding rows to column
If I understand correctly you want to create a new dataframe with selected columns which can be achieved this was as well, it will right away create a new dataframe with column headers df2-df1[ ,c(3,7,9,11,13,15)] -- View this message in context: http://r.789695.n4.nabble.com/Adding-rows-to-column-tp3005607p3006284.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding rows as arithmatic calculation on original rows
Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows as arithmatic calculation on original rows
This should get you close: x - read.table(textConnection(myID myType myNum1 myNum2 myNum3 + a Single 10 11 12 + b Single 15 25 35 + c Double 22 33 44 + d Double4 6 8), header=TRUE) closeAllConnections() y - lapply(split(x, x$myType), function(.type){ + .means - colMeans(.type[,3:5]) + # create the new line for the data frame + .df - data.frame(myID='', myType=.type$myType[1], myNum1=.means[1], + myNum2=.means[2], myNum3=.means[3]) + rbind(.type, .df) # append the line to the original dataframe + }) do.call(rbind, y) # you can add the names your want myID myType myNum1 myNum2 myNum3 Double.3 c Double 22.0 33.0 44.0 Double.4 d Double4.06.08.0 Double.myNum1 Double 13.0 19.5 26.0 Single.1 a Single 10.0 11.0 12.0 Single.2 b Single 15.0 25.0 35.0 Single.myNum1 Single 12.5 18.0 23.5 On Fri, Dec 5, 2008 at 3:21 PM, Ferry [EMAIL PROTECTED] wrote: Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows as arithmatic calculation on original rows
Thanks much Jim. On Fri, Dec 5, 2008 at 2:05 PM, jim holtman [EMAIL PROTECTED] wrote: This should get you close: x - read.table(textConnection(myID myType myNum1 myNum2 myNum3 + a Single 10 11 12 + b Single 15 25 35 + c Double 22 33 44 + d Double4 6 8), header=TRUE) closeAllConnections() y - lapply(split(x, x$myType), function(.type){ + .means - colMeans(.type[,3:5]) + # create the new line for the data frame + .df - data.frame(myID='', myType=.type$myType[1], myNum1=.means[1], + myNum2=.means[2], myNum3=.means[3]) + rbind(.type, .df) # append the line to the original dataframe + }) do.call(rbind, y) # you can add the names your want myID myType myNum1 myNum2 myNum3 Double.3 c Double 22.0 33.0 44.0 Double.4 d Double4.06.08.0 Double.myNum1 Double 13.0 19.5 26.0 Single.1 a Single 10.0 11.0 12.0 Single.2 b Single 15.0 25.0 35.0 Single.myNum1 Single 12.5 18.0 23.5 On Fri, Dec 5, 2008 at 3:21 PM, Ferry [EMAIL PROTECTED] wrote: Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows as arithmatic calculation on original rows
Here is a solution using sqldf library(sqldf) DF2 - structure(list(myID = structure(1:4, .Label = c(a, b, c, d), class = factor), myType = structure(c(2L, 2L, 1L, 1L), .Label = c(Double, Single), class = factor), myNum1 = c(10, 15, 22, 4), myNum2 = c(11, 25, 33, 6), myNum3 = c(12, 35, 44, 8)), .Names = c(myID, myType, myNum1, myNum2, myNum3), row.names = c(NA, -4L), class = data.frame) sqldf(select 1 TotalLevel, '+' myID, myType, avg(myNum1) myNum1, avg(myNum2) myNum2, avg(myNum3) myNum3 from DF2 group by myType union select 0 TotalLevel, * from DF2 order by myType, TotalLevel, myID, method = raw)[-1] The output is (display in fixed font): myID myType myNum1 myNum2 myNum3 1c Double 22.0 33.0 44.0 2d Double4.06.08.0 3+ Double 13.0 19.5 26.0 4a Single 10.0 11.0 12.0 5b Single 15.0 25.0 35.0 6+ Single 12.5 18.0 23.5 On Fri, Dec 5, 2008 at 3:21 PM, Ferry [EMAIL PROTECTED] wrote: Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows as arithmatic calculation on original rows
Here is a solution using doBy and Gabor's DF2 created below: library( doBy ) newrows - summaryBy ( myNum1 + myNum2 + myNum3 ~ myType , DF2, keep.names = TRUE ) newrows[,myID] - + rbind ( DF2, newrows) - Original message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Ferry [EMAIL PROTECTED] Cc: r-help@r-project.org Date: Fri, 5 Dec 2008 17:50:42 -0500 Subject: Re: [R] adding rows as arithmatic calculation on original rows Here is a solution using sqldf library(sqldf) DF2 - structure(list(myID = structure(1:4, .Label = c(a, b, c, d), class = factor), myType = structure(c(2L, 2L, 1L, 1L), .Label = c(Double, Single), class = factor), myNum1 = c(10, 15, 22, 4), myNum2 = c(11, 25, 33, 6), myNum3 = c(12, 35, 44, 8)), .Names = c(myID, myType, myNum1, myNum2, myNum3), row.names = c(NA, -4L), class = data.frame) sqldf(select 1 TotalLevel, '+' myID, myType, avg(myNum1) myNum1, avg(myNum2) myNum2, avg(myNum3) myNum3 from DF2 group by myType union select 0 TotalLevel, * from DF2 order by myType, TotalLevel, myID, method = raw)[-1] The output is (display in fixed font): myID myType myNum1 myNum2 myNum3 1c Double 22.0 33.0 44.0 2d Double4.06.08.0 3+ Double 13.0 19.5 26.0 4a Single 10.0 11.0 12.0 5b Single 15.0 25.0 35.0 6+ Single 12.5 18.0 23.5 On Fri, Dec 5, 2008 at 3:21 PM, Ferry [EMAIL PROTECTED] wrote: Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] adding rows as arithmatic calculation on original rows
thanks Gabor, and rmailbox. On Fri, Dec 5, 2008 at 3:32 PM, [EMAIL PROTECTED] wrote: Here is a solution using doBy and Gabor's DF2 created below: library( doBy ) newrows - summaryBy ( myNum1 + myNum2 + myNum3 ~ myType , DF2, keep.names = TRUE ) newrows[,myID] - + rbind ( DF2, newrows) - Original message - From: Gabor Grothendieck [EMAIL PROTECTED] To: Ferry [EMAIL PROTECTED] Cc: r-help@r-project.org Date: Fri, 5 Dec 2008 17:50:42 -0500 Subject: Re: [R] adding rows as arithmatic calculation on original rows Here is a solution using sqldf library(sqldf) DF2 - structure(list(myID = structure(1:4, .Label = c(a, b, c, d), class = factor), myType = structure(c(2L, 2L, 1L, 1L), .Label = c(Double, Single), class = factor), myNum1 = c(10, 15, 22, 4), myNum2 = c(11, 25, 33, 6), myNum3 = c(12, 35, 44, 8)), .Names = c(myID, myType, myNum1, myNum2, myNum3), row.names = c(NA, -4L), class = data.frame) sqldf(select 1 TotalLevel, '+' myID, myType, avg(myNum1) myNum1, avg(myNum2) myNum2, avg(myNum3) myNum3 from DF2 group by myType union select 0 TotalLevel, * from DF2 order by myType, TotalLevel, myID, method = raw)[-1] The output is (display in fixed font): myID myType myNum1 myNum2 myNum3 1c Double 22.0 33.0 44.0 2d Double4.06.08.0 3+ Double 13.0 19.5 26.0 4a Single 10.0 11.0 12.0 5b Single 15.0 25.0 35.0 6+ Single 12.5 18.0 23.5 On Fri, Dec 5, 2008 at 3:21 PM, Ferry [EMAIL PROTECTED] wrote: Dear R users, Suppose I have the following data.frame: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 c Double 22 33 44 d Double4 6 8 and I want to have new records: myID myType myNum1 myNum2 myNum3 e Single 12.5 18 23.5 f Double 13 19.5 28 where record e got its myNum1-3 as the average from record a and b, and record f got its myNum1-3 as the average from record c and d. and the final data.frame should be like the following: myID myType myNum1 myNum2 myNum3 a Single 10 11 12 b Single 15 25 35 e Single 12.5 18 43.5 c Double 22 33 44 d Double4 6 8 fDouble 13 19.5 28 Any idea is appreciated. Thanks beforehand. Ferry [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding rows to table
Greetings everyone, I'm trying to add a specific table or a specific number of rows (e.g.44) to a table with no success. This is my basic table head(dataA) yearplot spp prop.BDCA1DCA2DCA3DCA4 1 20001 a1 0.031079 -0.0776 -0.0009 0.0259 -0.0457 2 20001 a2 0.968921 -0.0448 0.1479 -0.1343 0.1670 3 20002 a1 0.029218 -0.0776 -0.0009 0.0259 -0.0457 4 20002 a4 0.021678 -0.3052 -0.0275 -0.0330 -0.0516 5 20002 a5 0.088596 0.0357 -0.0382 0.0171 -0.0471 6 20002 a6 0.065033 0.1219 -0.0588 -0.1119 -0.1795 I want, every time that the plot number changes, to add a specific table (data2) or a specific number of rows underneath. I've tried several variations of this with no result for (i in 2:nrow(dataA)) { ifelse(dataA$plot[i]!=dataA$plot[i-1],tab=data.frame(res[1:i,1:3]),a=1+1) rbind(as.matrix(dataA[1:i,]),as.matrix(dataB[,])),a=1+1) } many thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding rows to table
Greetings everyone, I'm trying to add a specific table or a specific number of rows (e.g.44) to a table with no success. This is my basic table head(dataA) yearplot spp prop.BDCA1DCA2DCA3DCA4 1 20001a1 0.031079 -0.0776 -0.0009 0.0259 -0.0457 2 20001a2 0.968921 -0.0448 0.1479 -0.1343 0.1670 3 20002a1 0.029218 -0.0776 -0.0009 0.0259 -0.0457 4 20002a4 0.021678 -0.3052 -0.0275 -0.0330 -0.0516 5 20002a5 0.088596 0.0357 -0.0382 0.0171 -0.0471 6 20002a6 0.065033 0.1219 -0.0588 -0.1119 -0.1795 I want, every time that the plot number changes, to add a specific table (data2) or a specific number of rows (44) underneath. I've tried several variations of this with no result for (i in 2:nrow(dataA)) { ifelse(dataA$plot[i]!=dataA$plot[i-1],tab=data.frame(res[1:i,1:3]),a=1+1) rbind(as.matrix(dataA[1:i,]),as.matrix(dataB[,])),a=1+1) } many thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.