Re: [R] parallel computation with plyr 1.2.1

2010-09-16 Thread Hadley Wickham
Yes, this was a little bug that will be fixed in the next release.
Hadley

On Thu, Sep 16, 2010 at 1:11 PM, Dylan Beaudette
debeaude...@ucdavis.edu wrote:
 Hi,

 I have been trying to use the new .parallel argument with the most recent
 version of plyr [1] to speed up some tasks. I can run the example in the NEWS
 file [1], and it seems to be working correctly. However, R will only use a
 single core when I try to apply this same approach with ddply().

 1. http://cran.r-project.org/web/packages/plyr/NEWS

 Watching my CPUs I see that in both cases only a single core is used, and they
 take about the same amount of time. Is there a limitation with how ddply()
 dispatches parallel jobs, or is this task not suitable for parallel
 computing?

 Cheers,
 Dylan


 Here is an example:

 library(plyr)
 library(doMC)
 registerDoMC(cores=2)

 # example data
 d - data.frame(y=rnorm(1000), id=rep(letters[1:4], each=500))

 # function that wastes some time
 f - function(x) {
 m - vector(length=1)
 for(i in 1:1) {
        m[i] - mean(sample(x$y, 100))
        }
 mean(m)
 }

 system.time(ddply(d, .(id), .fun=f, .parallel=FALSE))
 #  user  system elapsed
 #  2.740   0.016   2.766

 system.time(ddply(d, .(id), .fun=f, .parallel=TRUE))
 #  user  system elapsed
 #  2.720   0.000   2.726





 --
 Dylan Beaudette
 Soil Resource Laboratory
 http://casoilresource.lawr.ucdavis.edu/
 University of California at Davis
 530.754.7341

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parallel computation with plyr 1.2.1

2010-09-16 Thread Dylan Beaudette
On Thursday 16 September 2010, David Winsemius wrote:
 On Sep 16, 2010, at 1:11 PM, Dylan Beaudette wrote:
  Hi,
 
  I have been trying to use the new .parallel argument with the most
  recent
  version of plyr [1] to speed up some tasks. I can run the example in
  the NEWS
  file [1], and it seems to be working correctly. However, R will only
  use a
  single core when I try to apply this same approach with ddply().
 
  1. http://cran.r-project.org/web/packages/plyr/NEWS
 
  Watching my CPUs I see that in both cases only a single core is
  used, and they
  take about the same amount of time. Is there a limitation with how
  ddply()
  dispatches parallel jobs, or is this task not suitable for parallel
  computing?

 Was this done in a GUI? The registerDoMC help page says:
 ...  registerDoMC, should not be used in a GUI environment, because
 multiple processes then share the same GUI.

 I, by the way, before reading the above ran it on a Mac with the GUI
 with cores=4 and did experience a slightly decreased time. The non-GUI
 restriction may also explain why I couldn't get the multicore package
 to do anything useful when I tried it in the past.

Interesting. I did not run it from within a GUI, rather from a linux terminal. 
It is a little sad that doMC will not work when called from the GUI-- as most 
of the users that I am currently developing a package for will be constrained 
to windows.

Dylan

-- 
Dylan Beaudette
Soil Resource Laboratory
http://casoilresource.lawr.ucdavis.edu/
University of California at Davis
530.754.7341

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.