Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-21 Thread hadley wickham
Using Jim's index with my method gives you the best of both worlds:

x - matrix(sample(20, 1e6 * 3, replace = T), ncol = 3)

system.time({
dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
# sum up column 3 and output the first two columns with the indices
result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
c(x[.sect[1], 1:2], sum(x[.sect, 3]))
})
a - do.call(rbind, result)
})

system.time({
index - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
b - cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))
})
all.equal(a, b)

On my computer, Jim's method took 60 seconds and mine took 16.

Hadley

On Sun, Jul 20, 2008 at 8:41 PM, Ralph S. [EMAIL PROTECTED] wrote:

 yes - thank you very much! slowly getting to the full power of R . . .

 
 Date: Sun, 20 Jul 2008 21:21:35 -0400
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Sum efficiently from large matrix according to re-occuring 
 levels of factor?
 CC: [EMAIL PROTECTED]; r-help@r-project.org

 Does this do what you want:

 # following up on another idea that was presented
 # where are the breaks
 dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
 # sum up column 3 and output the first two columns with the indices
 result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
 + c(x[.sect[1], 1:2], sum(x[.sect, 3]))
 + })
 do.call(rbind, result)
   [,1] [,2] [,3]
 0173
 1242
 2323
 317   10


 On Sun, Jul 20, 2008 at 7:57 PM, Ralph S.  wrote:

 The first and second column are actually indices of another matrix (my 
 example may make this not sufficiently clear). I want to compare the sum 
 with that corresponding entry, and then record the result of that.

 Any idea?

 Best,

 Ralph



 
 Date: Sun, 20 Jul 2008 16:50:41 -0700
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Sum efficiently from large matrix according to 
 re-occuring levels of factor?
 CC: r-help@r-project.org

 On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham  wrote:
 On Sun, Jul 20, 2008 at 4:16 PM, Ralph S.  wrote:

 Hi,

 I am trying to calculate the sum for each occurrence of the level of a 
 factor in a very large matrix. In addition, I want to save that sum 
 together with the information of the level of the factor and the level 
 of a second factor.

 My matrix looks like this:

 x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

 I want to sum according to the levels in the first column and save the 
 sum with the information of the level in the first and the second column 
 in a new matrix.

 That is, I want output in the matrix of form:

 1 7 3
 2 4 2
 3 2 3
 1 7 10


 Why that and not:

 1 7 13
 2 4 2
 3 2 3

 ?

 Here's a solution for that case:

 index - x[, 2] + x[, 1] * max(x[, 2])
 cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))

 It takes about half a second for a million row matrix.

 Hadley



 --
 http://had.co.nz/

 _
 With Windows Live for mobile, your contacts travel with you.

 072008
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?

 _
 Use video conversation to talk face-to-face with Windows Live Messenger.
 http://www.windowslive.com/messenger/connect_your_way.html?ocid=TXT_TAGLM_WL_Refresh_messenger_video_072008



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-20 Thread Ralph S.

Hi,

I am trying to calculate the sum for each occurrence of the level of a factor 
in a very large matrix. In addition, I want to save that sum together with the 
information of the level of the factor and the level of a second factor.

My matrix looks like this:

x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

I want to sum according to the levels in the first column and save the sum with 
the information of the level in the first and the second column in a new matrix.

That is, I want output in the matrix of form:

1 7 3
2 4 2
3 2 3
1 7 10

The important thing is, that a factor level such as 1 in the example can 
re-occur many times. There are no regularities on the number of re-occurences 
etc.

How could I do this efficiently (the matrix is large:10^6 rows)?

Many thanks!!

-Ralph
_
[[elided Hotmail spam]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-20 Thread hadley wickham
On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. [EMAIL PROTECTED] wrote:

 Hi,

 I am trying to calculate the sum for each occurrence of the level of a factor 
 in a very large matrix. In addition, I want to save that sum together with 
 the information of the level of the factor and the level of a second factor.

 My matrix looks like this:

 x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

 I want to sum according to the levels in the first column and save the sum 
 with the information of the level in the first and the second column in a new 
 matrix.

 That is, I want output in the matrix of form:

 1 7 3
 2 4 2
 3 2 3
 1 7 10


Why that and not:

1 7 13
2 4 2
3 2 3

?

Hadley

-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-20 Thread hadley wickham
On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham [EMAIL PROTECTED] wrote:
 On Sun, Jul 20, 2008 at 4:16 PM, Ralph S. [EMAIL PROTECTED] wrote:

 Hi,

 I am trying to calculate the sum for each occurrence of the level of a 
 factor in a very large matrix. In addition, I want to save that sum together 
 with the information of the level of the factor and the level of a second 
 factor.

 My matrix looks like this:

 x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

 I want to sum according to the levels in the first column and save the sum 
 with the information of the level in the first and the second column in a 
 new matrix.

 That is, I want output in the matrix of form:

 1 7 3
 2 4 2
 3 2 3
 1 7 10


 Why that and not:

 1 7 13
 2 4 2
 3 2 3

 ?

Here's a solution for that case:

index - x[, 2] + x[, 1] * max(x[, 2])
cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))

It takes about half a second for a million row matrix.

Hadley



-- 
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-20 Thread jim holtman
Does this do what you want:

 # following up on another idea that was presented
 # where are the breaks
 dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
 # sum up column 3 and output the first two columns with the indices
 result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
+ c(x[.sect[1], 1:2], sum(x[.sect, 3]))
+ })
 do.call(rbind, result)
  [,1] [,2] [,3]
0173
1242
2323
317   10


On Sun, Jul 20, 2008 at 7:57 PM, Ralph S. [EMAIL PROTECTED] wrote:

 The first and second column are actually indices of another matrix (my 
 example may make this not sufficiently clear). I want to compare the sum with 
 that corresponding entry, and then record the result of that.

 Any idea?

 Best,

 Ralph



 
 Date: Sun, 20 Jul 2008 16:50:41 -0700
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Sum efficiently from large matrix according to re-occuring 
 levels of factor?
 CC: r-help@r-project.org

 On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham  wrote:
 On Sun, Jul 20, 2008 at 4:16 PM, Ralph S.  wrote:

 Hi,

 I am trying to calculate the sum for each occurrence of the level of a 
 factor in a very large matrix. In addition, I want to save that sum 
 together with the information of the level of the factor and the level of 
 a second factor.

 My matrix looks like this:

 x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

 I want to sum according to the levels in the first column and save the sum 
 with the information of the level in the first and the second column in a 
 new matrix.

 That is, I want output in the matrix of form:

 1 7 3
 2 4 2
 3 2 3
 1 7 10


 Why that and not:

 1 7 13
 2 4 2
 3 2 3

 ?

 Here's a solution for that case:

 index - x[, 2] + x[, 1] * max(x[, 2])
 cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))

 It takes about half a second for a million row matrix.

 Hadley



 --
 http://had.co.nz/

 _
 With Windows Live for mobile, your contacts travel with you.

 072008
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum efficiently from large matrix according to re-occuring levels of factor?

2008-07-20 Thread Ralph S.

yes - thank you very much! slowly getting to the full power of R . . .


 Date: Sun, 20 Jul 2008 21:21:35 -0400
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Sum efficiently from large matrix according to re-occuring 
 levels of factor?
 CC: [EMAIL PROTECTED]; r-help@r-project.org
 
 Does this do what you want:
 
 # following up on another idea that was presented
 # where are the breaks
 dataBreaks - cumsum(c(0, (diff(x[, 2] + x[, 1] * max(x[, 2])) != 0)))
 # sum up column 3 and output the first two columns with the indices
 result - lapply(split(seq(nrow(x)), dataBreaks), function(.sect){
 + c(x[.sect[1], 1:2], sum(x[.sect, 3]))
 + })
 do.call(rbind, result)
   [,1] [,2] [,3]
 0173
 1242
 2323
 317   10
 
 
 On Sun, Jul 20, 2008 at 7:57 PM, Ralph S.  wrote:

 The first and second column are actually indices of another matrix (my 
 example may make this not sufficiently clear). I want to compare the sum 
 with that corresponding entry, and then record the result of that.

 Any idea?

 Best,

 Ralph



 
 Date: Sun, 20 Jul 2008 16:50:41 -0700
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: [R] Sum efficiently from large matrix according to re-occuring 
 levels of factor?
 CC: r-help@r-project.org

 On Sun, Jul 20, 2008 at 4:47 PM, hadley wickham  wrote:
 On Sun, Jul 20, 2008 at 4:16 PM, Ralph S.  wrote:

 Hi,

 I am trying to calculate the sum for each occurrence of the level of a 
 factor in a very large matrix. In addition, I want to save that sum 
 together with the information of the level of the factor and the level of 
 a second factor.

 My matrix looks like this:

 x-matrix(c(1,1,1,2,2,3,3,1,1,7,7,7,4,4,2,2,7,7,1,1,1,1,1,1,2,5,5),9,3)

 I want to sum according to the levels in the first column and save the 
 sum with the information of the level in the first and the second column 
 in a new matrix.

 That is, I want output in the matrix of form:

 1 7 3
 2 4 2
 3 2 3
 1 7 10


 Why that and not:

 1 7 13
 2 4 2
 3 2 3

 ?

 Here's a solution for that case:

 index - x[, 2] + x[, 1] * max(x[, 2])
 cbind(x[!duplicated(index), 1:2], tapply(x[, 3], index, sum))

 It takes about half a second for a million row matrix.

 Hadley



 --
 http://had.co.nz/

 _
 With Windows Live for mobile, your contacts travel with you.

 072008
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
 
 
 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390
 
 What is the problem you are trying to solve?

_


_WL_Refresh_messenger_video_072008
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.