Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
Using Jim's index with my method gives you the best of both worlds: x - matrix(sample(20, 1e6 * 3, replace = T), ncol = 3) system.time({ dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0))) # sum up column 3 and output the first two columns with the indices result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){ c(x[.sect[1], 1:2], sum(x[.sect, 3])) }) a - do.call(rbind, result) }) system.time({ index - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0))) b - cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum)) }) all.equal(a, b) On my computer, Jim's method took 60 seconds and mine took 16. Hadley On Sun, Jul 20, 2008 at 8:41 PM, Ralph S. [EMAIL PROTECTED] wrote: yes - thank you very much! slowly getting to the full power of R . . . Date: Sun, 20 Jul 2008 21:21:35 -0400 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor? CC: [EMAIL PROTECTED]; r-help@r-project.org Does this do what you want: # following up on another idea that was presented # where are the breaks dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0))) # sum up column 3 and output the first two columns with the indices result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){ + c(x[.sect[1], 1:2], sum(x[.sect, 3])) + }) do.call(rbind, result) [,1] [,2] [,3] 0173 1242 2323 317 10 On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. wrote: The first and second column are actually indices of another matrix (my example may make this not sufficiently clear). I want to compare the sum with that corresponding entry, and then record the result of that. Any idea? Best, Ralph Date: Sun, 20 Jul 2008 16:50:41 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor? CC: r-help@r-project.org On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham wrote: On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. wrote: Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 Why that and not: 1 7 13 2 4 2 3 2 3 ? Here's a solution for that case: index - x[, 2] + x[, 1] * max(x[, 2]) cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum)) It takes about half a second for a million row matrix. Hadley -- http://had.co.nz/ _ With Windows Live for mobile, your contacts travel with you. 072008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? _ Use video conversation to talk face-to-face with Windows Live Messenger. http://www.windowslive.com/messenger/connect_your_way.html?ocid=TXT_TAGLM_WL_Refresh_messenger_video_072008 -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sum efficiently from large matrix according to re-occuring levels of factor?
Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 The important thing is, that a factor level such as 1 in the example can re-occur many times. There are no regularities on the number of re-occurences etc. How could I do this efficiently (the matrix is large:10^6 rows)? Many thanks!! -Ralph _ [[elided Hotmail spam]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. [EMAIL PROTECTED] wrote: Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 Why that and not: 1 7 13 2 4 2 3 2 3 ? Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham [EMAIL PROTECTED] wrote: On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. [EMAIL PROTECTED] wrote: Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 Why that and not: 1 7 13 2 4 2 3 2 3 ? Here's a solution for that case: index - x[, 2] + x[, 1] * max(x[, 2]) cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum)) It takes about half a second for a million row matrix. Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
Does this do what you want: # following up on another idea that was presented # where are the breaks dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0))) # sum up column 3 and output the first two columns with the indices result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){ + c(x[.sect[1], 1:2], sum(x[.sect, 3])) + }) do.call(rbind, result) [,1] [,2] [,3] 0173 1242 2323 317 10 On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. [EMAIL PROTECTED] wrote: The first and second column are actually indices of another matrix (my example may make this not sufficiently clear). I want to compare the sum with that corresponding entry, and then record the result of that. Any idea? Best, Ralph Date: Sun, 20 Jul 2008 16:50:41 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor? CC: r-help@r-project.org On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham wrote: On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. wrote: Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 Why that and not: 1 7 13 2 4 2 3 2 3 ? Here's a solution for that case: index - x[, 2] + x[, 1] * max(x[, 2]) cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum)) It takes about half a second for a million row matrix. Hadley -- http://had.co.nz/ _ With Windows Live for mobile, your contacts travel with you. 072008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?
yes - thank you very much! slowly getting to the full power of R . . . Date: Sun, 20 Jul 2008 21:21:35 -0400 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor? CC: [EMAIL PROTECTED]; r-help@r-project.org Does this do what you want: # following up on another idea that was presented # where are the breaks dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0))) # sum up column 3 and output the first two columns with the indices result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){ + c(x[.sect[1], 1:2], sum(x[.sect, 3])) + }) do.call(rbind, result) [,1] [,2] [,3] 0173 1242 2323 317 10 On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. wrote: The first and second column are actually indices of another matrix (my example may make this not sufficiently clear). I want to compare the sum with that corresponding entry, and then record the result of that. Any idea? Best, Ralph Date: Sun, 20 Jul 2008 16:50:41 -0700 From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor? CC: r-help@r-project.org On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham wrote: On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. wrote: Hi, I am trying to calculate the sum for each occurrence of the level of a factor in a very large matrix. In addition, I want to save that sum together with the information of the level of the factor and the level of a second factor. My matrix looks like this: x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3) I want to sum according to the levels in the first column and save the sum with the information of the level in the first and the second column in a new matrix. That is, I want output in the matrix of form: 1 7 3 2 4 2 3 2 3 1 7 10 Why that and not: 1 7 13 2 4 2 3 2 3 ? Here's a solution for that case: index - x[, 2] + x[, 1] * max(x[, 2]) cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum)) It takes about half a second for a million row matrix. Hadley -- http://had.co.nz/ _ With Windows Live for mobile, your contacts travel with you. 072008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? _ _WL_Refresh_messenger_video_072008 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.