Re: [R] Splitting or Subsetting Using foreach

2016-12-01 Thread David Winsemius

> On Dec 1, 2016, at 9:27 AM, Doran, Harold  wrote:
> 
> I am having tremendous fortune using the foreach function in the foreach 
> package sending work out to multiple cores in order to reduce computational 
> time.
> 
> I am experimenting with which types of tasks benefit from running in parallel 
> and which do not and so this is a bit of a learning experience by trial and 
> error.
> 
> One particular task I cannot seem to realize a benefit from (in terms of 
> reduced time) is splitting or subsetting a large data frame. I realize there 
> are other "fast" options like using data.table, but current goal is to see if 
> this can benefit from multiple cores or not. 
> 
> So, a very small toy example of how I am approaching the "traditional" and 
> "parallel" way is as follows. My actual data is much, much larger and it 
> turns out the parallel version of doing it this way vis-à-vis the traditional 
> way is unbelievably slow. Hence Im not sure if there is a good theoretical 
> reason why such a task cannot run faster when sent out to multiple cores if 
> there is a user error that I need to better understand and correct
> 
> library(foreach)
> library(doParallel)
> registerDoParallel(cores=4)
> 
> tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000))
> 
> ff1 <- split(tmp, tmp$id)
> 
> myList <- unique(tmp$id)
> N <- length(myList)
> ff2 <- foreach(i = 1:N) %dopar% { tmp[which(tmp$id == myList[i]),]}

I would have imagined that using split to deliver separate instance of separate 
data.frame parcels to the `i` -argument would be more sensible. Otherwise you 
are sending full copies to each worker and then doing the extraction N times 
rather than once.There's a lot of checking using data.frame methods. I also 
think you would want to avoid making reference to objects "outside" the 
parallel function application.

ff2 <- foreach( z = iter( ff1) ) %dopar% { max(z$id) }


> 
> Thanks,
> Harold
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Splitting or Subsetting Using foreach

2016-12-01 Thread Doran, Harold
I am having tremendous fortune using the foreach function in the foreach 
package sending work out to multiple cores in order to reduce computational 
time.

I am experimenting with which types of tasks benefit from running in parallel 
and which do not and so this is a bit of a learning experience by trial and 
error.

One particular task I cannot seem to realize a benefit from (in terms of 
reduced time) is splitting or subsetting a large data frame. I realize there 
are other "fast" options like using data.table, but current goal is to see if 
this can benefit from multiple cores or not. 

So, a very small toy example of how I am approaching the "traditional" and 
"parallel" way is as follows. My actual data is much, much larger and it turns 
out the parallel version of doing it this way vis-à-vis the traditional way is 
unbelievably slow. Hence Im not sure if there is a good theoretical reason why 
such a task cannot run faster when sent out to multiple cores if there is a 
user error that I need to better understand and correct

library(foreach)
library(doParallel)
registerDoParallel(cores=4)

tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000))

ff1 <- split(tmp, tmp$id)

myList <- unique(tmp$id)
N <- length(myList)
ff2 <- foreach(i = 1:N) %dopar% { tmp[which(tmp$id == myList[i]),]}

Thanks,
Harold

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.