Re: [R] R on a supercomputer

2005-10-11 Thread Sean Davis
On 10/10/05 3:54 PM, Kimpel, Mark William [EMAIL PROTECTED] wrote:

 I am using R with Bioconductor to perform analyses on large datasets
 using bootstrap methods. In an attempt to speed up my work, I have
 inquired about using our local supercomputer and asked the administrator
 if he thought R would run faster on our parallel network. I received the
 following reply:
 
 
 
 
 
 The second benefit is that the processors have large caches.
 
 Briefly, everything is loaded into cache before going into the
 processor.  With large caches, there is less movement of data between
 memory and cache, and this can save quite a bit of time.  Indeed, when
 programmers optimize code they usually think about how to do things to
 keep data in cache as long as possible.
 
 Whether you would receive any benefit from larger cache depends on how
 R is written. If it's written such that  data remain in cache, the
 speed-up could be considerable, but I have no way to predict it.
 
 
 
 My question is, is R written such that data remain in cache?

Using the cluster model (which may or may not be what you are calling a
supercomputer--I don't know the exact terminology here), jobs that involve
repetitive, independent tasks like computing statistics on bootstrap
replicates can benefit from parallelization IF the I/O associated with
running the single replicate does not outweigh the benefit of using multiple
processors.  For example, if you are running 1 replicates and each takes
1 ms, then you have a 10 second job on a single processor.  One could
envision spreading that same process over 1000 processors and doing the job
in 10 ms, but if one counts the I/O (network, moving into cache, etc.) which
could take 1 second per batch of replicates (for example), then that job
will take AT LEAST 10 seconds with 1000 processors, also.  However, if the
same computation takes 1 second per replicate, then the whole job takes
10,000 seconds on a single processor, but only about 11 seconds on the 1000
processors (approximately).  This rationale is only approximate, but I hope
it shows the point.

We have begun to use a 60-node linux cluster for some of our work (also
microarray-based) and use MPI/snow with very nice results for multiple
independent, long-running tasks.  Snow is VERY easy to use, but one could
also drop back to the Rmpi if needed, to have finer-grain control over the
parallelization process.

As for how caching behaviors come into it and how R without parallelized
R-code would perform, I can't really comment; my experience is limited to
the cluster model with parallelized R-code.

Sean

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] R on a supercomputer

2005-10-10 Thread Kimpel, Mark William
I am using R with Bioconductor to perform analyses on large datasets
using bootstrap methods. In an attempt to speed up my work, I have
inquired about using our local supercomputer and asked the administrator
if he thought R would run faster on our parallel network. I received the
following reply:

 

 

The second benefit is that the processors have large caches. 

Briefly, everything is loaded into cache before going into the
processor.  With large caches, there is less movement of data between
memory and cache, and this can save quite a bit of time.  Indeed, when
programmers optimize code they usually think about how to do things to
keep data in cache as long as possible. 

  Whether you would receive any benefit from larger cache depends on how
R is written. If it's written such that  data remain in cache, the
speed-up could be considerable, but I have no way to predict it.

 

My question is, is R written such that data remain in cache? 

 

Thanks,

 

 

Mark W. Kimpel MD 

 

Indiana University School of Medicine

 


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R on a supercomputer

2005-10-10 Thread Tony Plate
In general, R is not written in such a way that data remain in cache. 
However, R can use optimized BLAS libraries, and these are.   So if your 
version of R is compiled to use an optimized BLAS library appropriate to 
the machine (e.g., ATLAS, or Prof. Goto's Blas), AND a considerable 
amount of the computation done in your R program involves basic linear 
algebra (matrix multiplication, etc.), then you might see a good speedup.

-- Tony Plate

Kimpel, Mark William wrote:
 I am using R with Bioconductor to perform analyses on large datasets
 using bootstrap methods. In an attempt to speed up my work, I have
 inquired about using our local supercomputer and asked the administrator
 if he thought R would run faster on our parallel network. I received the
 following reply:
 
  
 
  
 
 The second benefit is that the processors have large caches. 
 
 Briefly, everything is loaded into cache before going into the
 processor.  With large caches, there is less movement of data between
 memory and cache, and this can save quite a bit of time.  Indeed, when
 programmers optimize code they usually think about how to do things to
 keep data in cache as long as possible. 
 
   Whether you would receive any benefit from larger cache depends on how
 R is written. If it's written such that  data remain in cache, the
 speed-up could be considerable, but I have no way to predict it.
 
  
 
 My question is, is R written such that data remain in cache? 
 
  
 
 Thanks,
 
  
 
  
 
 Mark W. Kimpel MD 
 
  
 
 Indiana University School of Medicine
 
  
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html