[R] Help with speed (replacing the loop?)

2012-01-11 Thread Dimitri Liakhovitski
Dear R-ers,

I have a loop below that loops through my numeric variables in data
frame x and through levels of the factor group and multiplies (group
by group) the values of numeric variables in x by the corresponding
group-specific values from data frame y. In reality, my:
dim(x) is 300,000 rows by 100 variables, and
dim(y) is 120 levels of group by 100 variables.
So, my huge data frame x takes up a lot of space in memory. This is
why I am actually replacing values of a and b in x with newly
calculated values, rather than adding them.
The code does what I need, but it takes forever.

Is there maybe a more speedy way to achieve what I need?
Thanks a lot!
Dimitri


# Example data:
x-data.frame(group=c(rep(group1,5),rep(group2,5)),
a=1:10,b=seq(10,100,by=10))
x$group-as.factor(x$group)
y-data.frame(group=c(group1,group2),a=c(10,20),b=c(2,3))
y$group-as.factor(y$group)
(x);(y)

# My code:
myvars-c(a,b)
for(var in myvars){
for(group in levels(y$group)){
  temp-x[x$group %in% group,var]
  temp-temp * y[y$group %in% group,var]
  x[x$group %in% group,var]-temp
}
}
(x)
-- 
Dimitri Liakhovitski

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with speed (replacing the loop?)

2012-01-11 Thread Steve Lianoglou
Hi,

On Wed, Jan 11, 2012 at 9:57 AM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Dear R-ers,

 I have a loop below that loops through my numeric variables in data
 frame x and through levels of the factor group and multiplies (group
 by group) the values of numeric variables in x by the corresponding
 group-specific values from data frame y. In reality, my:
 dim(x) is 300,000 rows by 100 variables, and
 dim(y) is 120 levels of group by 100 variables.
 So, my huge data frame x takes up a lot of space in memory. This is
 why I am actually replacing values of a and b in x with newly
 calculated values, rather than adding them.
 The code does what I need, but it takes forever.

 Is there maybe a more speedy way to achieve what I need?
 Thanks a lot!

Here's an all-middle-steps included way to do so using data.table. If
you use more data.table-centric idioms (using `:=` operator and other
ways to `merge`) you can likely eek out less memory and higher speed,
but I'll leave it like so for pedagogical purposes ;-)


library(data.table)

## your data
xx - data.table(group=c(rep(group1,5),rep(group2,5)),
 a=1:10, b=seq(10,100,by=10), key=group)
yy - data.table(group=c(group1,group2), a=c(10,20), b=c(2,3),
 key=group)

## temp data.table to get your ducks in a row
m - merge(xx, yy, by=group, suffixes=c(.x, .y))

## your answers will be in the aa and bb columns
result - transform(m, aa=a.x * a.y, bb=b.x * b.y)



Truth be told, if you use normal data.frames, the code will look very
similar to above, so you can try that, too.

HTH,
-steve


-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with speed (replacing the loop?)

2012-01-11 Thread Dimitri Liakhovitski
Thanks a lot, Steve. I have one question (below):


 
 library(data.table)

 ## your data
 xx - data.table(group=c(rep(group1,5),rep(group2,5)),
                 a=1:10, b=seq(10,100,by=10), key=group)
 yy - data.table(group=c(group1,group2), a=c(10,20), b=c(2,3),
                 key=group)

 ## temp data.table to get your ducks in a row
 m - merge(xx, yy, by=group, suffixes=c(.x, .y))


Dimitri: The step above (merge) - I was thinking of it but decided
against it because my xx already fills up tons of memory. When I merge
xx and yy that doubles the number of variables - I am afraid my memory
won't hold that much stuff...


 ## your answers will be in the aa and bb columns
 result - transform(m, aa=a.x * a.y, bb=b.x * b.y)

 

 Truth be told, if you use normal data.frames, the code will look very
 similar to above, so you can try that, too.

 HTH,
 -steve


 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with speed (replacing the loop?)

2012-01-11 Thread Steve Lianoglou
Hi,

On Wed, Jan 11, 2012 at 10:50 AM, Dimitri Liakhovitski
dimitri.liakhovit...@gmail.com wrote:
 Thanks a lot, Steve. I have one question (below):


 
 library(data.table)

 ## your data
 xx - data.table(group=c(rep(group1,5),rep(group2,5)),
                 a=1:10, b=seq(10,100,by=10), key=group)
 yy - data.table(group=c(group1,group2), a=c(10,20), b=c(2,3),
                 key=group)

 ## temp data.table to get your ducks in a row
 m - merge(xx, yy, by=group, suffixes=c(.x, .y))


 Dimitri: The step above (merge) - I was thinking of it but decided
 against it because my xx already fills up tons of memory. When I merge
 xx and yy that doubles the number of variables - I am afraid my memory
 won't hold that much stuff...

Fair enough ... how about just using `match`, then, ie:

R aa - x$a * y$a[match(x$group, y$group)]

Should do the trick, no?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with speed (replacing the loop?)

2012-01-11 Thread Jeff Newmiller
It is common that performance problems are addressed by using more memory. If 
your algorithm needs to join those tables and do calculations, then you can 
either pay the piper in memory (usually the most appropriate answer) or you can 
reinvent those optimized algorithms in a compiled language and figure out how 
to thread them together to minimize memory use (a dangerous course of action). 
Try working on chunks of xx? Or getting more memory in your computer?
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Dimitri Liakhovitski dimitri.liakhovit...@gmail.com wrote:

Thanks a lot, Steve. I have one question (below):


 
 library(data.table)

 ## your data
 xx - data.table(group=c(rep(group1,5),rep(group2,5)),
                 a=1:10, b=seq(10,100,by=10), key=group)
 yy - data.table(group=c(group1,group2), a=c(10,20), b=c(2,3),
                 key=group)

 ## temp data.table to get your ducks in a row
 m - merge(xx, yy, by=group, suffixes=c(.x, .y))


Dimitri: The step above (merge) - I was thinking of it but decided
against it because my xx already fills up tons of memory. When I merge
xx and yy that doubles the number of variables - I am afraid my memory
won't hold that much stuff...


 ## your answers will be in the aa and bb columns
 result - transform(m, aa=a.x * a.y, bb=b.x * b.y)

 

 Truth be told, if you use normal data.frames, the code will look very
 similar to above, so you can try that, too.

 HTH,
 -steve


 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with speed (replacing the loop?)

2012-01-11 Thread Dimitri Liakhovitski
Thanks a lot, Steve!
match sounds very promising - that means I only need a loop across predictors.

As far as get more memory advice is concerned: I already have more memory :)

On Wed, Jan 11, 2012 at 11:14 AM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi,

 On Wed, Jan 11, 2012 at 10:50 AM, Dimitri Liakhovitski
 dimitri.liakhovit...@gmail.com wrote:
 Thanks a lot, Steve. I have one question (below):


 
 library(data.table)

 ## your data
 xx - data.table(group=c(rep(group1,5),rep(group2,5)),
                 a=1:10, b=seq(10,100,by=10), key=group)
 yy - data.table(group=c(group1,group2), a=c(10,20), b=c(2,3),
                 key=group)

 ## temp data.table to get your ducks in a row
 m - merge(xx, yy, by=group, suffixes=c(.x, .y))


 Dimitri: The step above (merge) - I was thinking of it but decided
 against it because my xx already fills up tons of memory. When I merge
 xx and yy that doubles the number of variables - I am afraid my memory
 won't hold that much stuff...

 Fair enough ... how about just using `match`, then, ie:

 R aa - x$a * y$a[match(x$group, y$group)]

 Should do the trick, no?

 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact



-- 
Dimitri Liakhovitski
marketfusionanalytics.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.