[R] How to utilise dual cores and multi-processors on WinXP

2007-03-06 Thread rhelp . 20 . trevva
Hello,

I have a question that I was wondering if anyone had a fairly straightforward 
answer to: what is the quickest and easiest way to take advantage of the extra 
cores / processors that are now commonplace on modern machines? And how do I do 
that in Windows?

I realise that this is a complex question that is not answered easily, so let 
me refine it some more. The type of scripts that I'm dealing with are well 
suited to parallelisation - often they involve mapping out parameter space by 
changing a single parameter and then re-running the simulation 10 (or n times), 
and then brining all the results back to gether at the end for analysis. If I 
can distribute the runs over all the processors available in my machine, I'm 
going to roughly halve the run speed. The question is, how to do this?

I've looked at many of the packages in this area: rmpi, snow, snowFT, rpvm, and 
taskPR - these all seem to have the functionality that I want, but don't exist 
for windows. The best solution is to switch to Linux, but unfortunately that's 
not an option. 

Another option is to divide the task in half from the beginning, spawn two 
slave instances of R (e.g. via Rcmd), let them run, and then collate the 
results at the end. But how exactly to do this and how to know when they're 
done?

Can anyone recommend a nice solution? I'm sure that I'm not the only one who'd 
love to double their computational speed...

Cheers,

Mark

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to utilise dual cores and multi-processors on WinXP

2007-03-06 Thread Greg Snow
The nws package does run on windows and can split calculations between
multiple R processes.  I have not tried it with a single multiprocessor
pc (don't have one), but have used it with multiple pc's.  It looks like
the muliprocessor pc would work pretty much with the defaults.

Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
[EMAIL PROTECTED]
(801) 408-8111
 
 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 [EMAIL PROTECTED]
 Sent: Tuesday, March 06, 2007 8:33 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] How to utilise dual cores and multi-processors on WinXP
 
 Hello,
 
 I have a question that I was wondering if anyone had a fairly 
 straightforward answer to: what is the quickest and easiest 
 way to take advantage of the extra cores / processors that 
 are now commonplace on modern machines? And how do I do that 
 in Windows?
 
 I realise that this is a complex question that is not 
 answered easily, so let me refine it some more. The type of 
 scripts that I'm dealing with are well suited to 
 parallelisation - often they involve mapping out parameter 
 space by changing a single parameter and then re-running the 
 simulation 10 (or n times), and then brining all the results 
 back to gether at the end for analysis. If I can distribute 
 the runs over all the processors available in my machine, I'm 
 going to roughly halve the run speed. The question is, how to do this?
 
 I've looked at many of the packages in this area: rmpi, snow, 
 snowFT, rpvm, and taskPR - these all seem to have the 
 functionality that I want, but don't exist for windows. The 
 best solution is to switch to Linux, but unfortunately that's 
 not an option. 
 
 Another option is to divide the task in half from the 
 beginning, spawn two slave instances of R (e.g. via Rcmd), 
 let them run, and then collate the results at the end. But 
 how exactly to do this and how to know when they're done?
 
 Can anyone recommend a nice solution? I'm sure that I'm not 
 the only one who'd love to double their computational speed...
 
 Cheers,
 
 Mark
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to utilise dual cores and multi-processors on WinXP

2007-03-06 Thread Martin Morgan
[EMAIL PROTECTED] writes:

 Hello,

 I have a question that I was wondering if anyone had a fairly
 straightforward answer to: what is the quickest and easiest way to
 take advantage of the extra cores / processors that are now
 commonplace on modern machines? And how do I do that in Windows?

 I realise that this is a complex question that is not answered easily,
 so let me refine it some more. The type of scripts that I'm dealing
 with are well suited to parallelisation - often they involve mapping
 out parameter space by changing a single parameter and then re-running
 the simulation 10 (or n times), and then brining all the results back
 to gether at the end for analysis. If I can distribute the runs over
 all the processors available in my machine, I'm going to roughly halve
 the run speed. The question is, how to do this?

 I've looked at many of the packages in this area: rmpi, snow, snowFT,
 rpvm, and taskPR - these all seem to have the functionality that I
 want, but don't exist for windows. The best solution is to switch to
 Linux, but unfortunately that's not an option.

Rmpi runs on windows (see http://www.stats.uwo.ca/faculty/yu/Rmpi/).

You'll end up modifying your code, probably using one of the many
parLapply-like functions (from Rmpi; comparable functions in snow and
the package papply) to do 'lapply' but spread over the different
compute processors. This is likely to require some thought, as for
instance the data transmission costs can overwhelm any speedup and the
FUN argument to the lapply-like functions should probably reference
only local variables. The classic first attempt performs the
equivalent of 1000 bootstraps on each node, rather than dividing the
1000 replicates amongst nodes (which is actually quite hard to do).

In principle I think you might also be able to use a parallelized
LAPACK, following the general instruction of the R Installation and
Administration guide. I have not done this. It would likely represent
a challenge, and would benefit (perhaps) the code that uses the LAPACK
linear algebra routines.

 Another option is to divide the task in half from the beginning, spawn
 two slave instances of R (e.g. via Rcmd), let them run, and then
 collate the results at the end. But how exactly to do this and how to
 know when they're done?

The Bioconductor package Biobase has a function Aggregate that might
be fun to explore; I don't think it receives much use.

 Can anyone recommend a nice solution? I'm sure that I'm not the only
 one who'd love to double their computational speed...

 Cheers,

 Mark

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
 posting guide http://www.R-project.org/posting-guide.html and provide
 commented, minimal, self-contained, reproducible code.

-- 
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.