Re: [R-SIG-Mac] Is R more heavy on memory or processor?

2009-03-26 Thread Prof Brian Ripley

On Tue, 24 Mar 2009, Simon Urbanek wrote:


On Mar 24, 2009, at 14:55 , Booman, M wrote:


Dear all,

I am going to purchase a Power Mac (a new one, with Nehalem processor) for 
my R-based microarray analyses. I use mainly Bioconductor packages, and a 
typical dataset would consist of 50 microarrays with 40,000 datapoints 
each. To make the right choice of processor and memory, I have a few 
questions:


I don't use BioC [you may want to ask on the BioC list instead (or hopefully 
some BioC users will chip in)], so my recommendations may be based on 
slightly different problems.


- would the current version of R benefit from the 8 cores in the new Intel 
Xeon Nehalem 8-core Mac Pro? So would an 8-core 2.26GHz machine be better 
than a 4-core 2.93GHz?


Unfortunately I cannot comment on Nehalems, but in general with Xeons you do 
feel quite a difference in the clock speed, so I wouldn't trade 2.93GHz for 
2.26GHz regardless of the CPU generation. It is true that pre-Nehalem Mac 
Pros cannot feed 8 cores, so you want go for the new Mac Pros, but I wouldn't 
even think about the 2.26GHz option. Some benchmarks suggest that the 2.26 
Nehalem can still compete favorably if a lot of memory/io is involved, but it 
was not very convincing and I cannot tell first hand.


Simon,

We've some experience with recent Xeons on Linux serrers, and that 
says that the size of the L1 cache is at least as important as clock 
speed.  The following figures are from memory and rounded   A dual 
quad-core 2.5GHz 12Mb cache system (we've an identical pair, one my 
server, bought in January) outperforms a dual quad-core 3CHz 6Mb cache 
system bought 9 months earlier.  That's running R, and in particular 
multiple R jobs.  At least here, the extra cost of the 2.93GHz 
processor is phenomenal.


Also, it looks to us like the Achilles' Heel of the Mac Pro is its 
disk system.  Even if you load it up with a RAID controller and extra 
discs (pretty exorbitant, too) it is still on paper way down on my 
server -- and the 3GHz server does considerably outperform mine on 
disc I/O as it has more discs and a better RAID controller, and our 
Solaris servers are better still.


Just a bit of background,

Brian

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


Re: [R-SIG-Mac] Is R more heavy on memory or processor?

2009-03-24 Thread Dan Putler
Hi Marije,

Personally, I would be more concerned with memory than processor.
Running out of memory can be an unpleasant surprise. Base R uses a
single core, but Simon Urbanek's multicore package (the most recent
version of which, 0.1-3, is dated today) does allow you to use multiple
cores at once. I haven't used this package, so can't offer any personal
experience.

Dan
 
On Tue, 2009-03-24 at 19:55 +0100, Booman, M wrote:
 Dear all,
  
 I am going to purchase a Power Mac (a new one, with Nehalem processor) for my 
 R-based microarray analyses. I use mainly Bioconductor packages, and a 
 typical dataset would consist of 50 microarrays with 40,000 datapoints each. 
 To make the right choice of processor and memory, I have a few questions:
  
 - would the current version of R benefit from the 8 cores in the new Intel 
 Xeon Nehalem 8-core Mac Pro? So would an 8-core 2.26GHz machine be better 
 than a 4-core 2.93GHz? Or can R only use one core (in which case the 4-core 
 2.93GHZ machine would be better)?
  
 - If R does not benefot from multiple cores yet, is there anything known 
 about whether Snow Leopard might make a difference in this?
  
 - To determine if my first priority should be processor speed or RAM, on 
 which does R rely more heavily?
  
 - The new chipset has 3 memory channels (forgive me if I word this wrong, as 
 you may have noticed I am no computer tech) so it can read 6Gb RAM faster 
 than it can read 8Gb of RAM; so for a program that relies more on RAM speed 
 than RAM quantity it is recommended to use 6Gb instead of 8 for better 
 performance (or any multiple of 3). Which is more important for R, RAM speed 
 or RAM quantity?
  
 (I am not sure if it helps to know, but previously I used a Powermac G5 
 quadcore (sadly I forgot which processor speed but it was the standard G5 
 quadcore) with 4 Gb RAM for datasets of 30-40 microarrays of 18,000 
 datapoints each, and analysis was OK except for some memory errors in a 
 script that used permutation analysis; but it wasn't very fast.)
  
 Any recommendations are welcome!
  
 Marije Booman
 
 
 De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de 
 geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken 
 van dit bericht, het niet openbaar maken of op enige wijze verspreiden of 
 vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een 
 incomplete aankomst of vertraging van dit verzonden bericht.
 
 The contents of this message are confidential and only intended for the eyes 
 of the addressee(s). Others than the addressee(s) are not allowed to use this 
 message, to make it public or to distribute or multiply this message in any 
 way. The UMCG cannot be held responsible for incomplete reception or delay of 
 this transferred message.
 
   [[alternative HTML version deleted]]
 
 ___
 R-SIG-Mac mailing list
 R-SIG-Mac@stat.math.ethz.ch
 https://stat.ethz.ch/mailman/listinfo/r-sig-mac
-- 
Dan Putler
Sauder School of Business
University of British Columbia

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


Re: [R-SIG-Mac] Is R more heavy on memory or processor?

2009-03-24 Thread Steven McKinney
I agree with Dan, memory will often be the limiting
factor.  I added RAM (16GB total) to my ppc and have
had a much more productive environment, both for
32 bit and 64 bit applications.

Even if a single R session cannot benefit from multiple
cores, if you can break your processes into parallel
pieces you can use your separate CPUs with cluster
software, or just run multiple R jobs manually.

I'd recommend maximizing your RAM quantity over
RAM speed.  Also, determine the speed gain.
Speed gains of 10-fold or more are noticeable,
speed gains of 2 to 3 fold rarely make much of a 
difference.

Steven McKinney, Ph.D.

Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre

email: smckinney +at+ bccrc +dot+ ca

tel: 604-675-8000 x7561

BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C. 
V5Z 1L3
Canada




-Original Message-
From: r-sig-mac-boun...@stat.math.ethz.ch on behalf of Dan Putler
Sent: Tue 3/24/2009 12:08 PM
To: Booman, M
Cc: R-SIG-Mac
Subject: Re: [R-SIG-Mac] Is R more heavy on memory or processor?
 
Hi Marije,

Personally, I would be more concerned with memory than processor.
Running out of memory can be an unpleasant surprise. Base R uses a
single core, but Simon Urbanek's multicore package (the most recent
version of which, 0.1-3, is dated today) does allow you to use multiple
cores at once. I haven't used this package, so can't offer any personal
experience.

Dan
 
On Tue, 2009-03-24 at 19:55 +0100, Booman, M wrote:
 Dear all,
  
 I am going to purchase a Power Mac (a new one, with Nehalem processor) for my 
 R-based microarray analyses. I use mainly Bioconductor packages, and a 
 typical dataset would consist of 50 microarrays with 40,000 datapoints each. 
 To make the right choice of processor and memory, I have a few questions:
  
 - would the current version of R benefit from the 8 cores in the new Intel 
 Xeon Nehalem 8-core Mac Pro? So would an 8-core 2.26GHz machine be better 
 than a 4-core 2.93GHz? Or can R only use one core (in which case the 4-core 
 2.93GHZ machine would be better)?
  
 - If R does not benefot from multiple cores yet, is there anything known 
 about whether Snow Leopard might make a difference in this?
  
 - To determine if my first priority should be processor speed or RAM, on 
 which does R rely more heavily?
  
 - The new chipset has 3 memory channels (forgive me if I word this wrong, as 
 you may have noticed I am no computer tech) so it can read 6Gb RAM faster 
 than it can read 8Gb of RAM; so for a program that relies more on RAM speed 
 than RAM quantity it is recommended to use 6Gb instead of 8 for better 
 performance (or any multiple of 3). Which is more important for R, RAM speed 
 or RAM quantity?
  
 (I am not sure if it helps to know, but previously I used a Powermac G5 
 quadcore (sadly I forgot which processor speed but it was the standard G5 
 quadcore) with 4 Gb RAM for datasets of 30-40 microarrays of 18,000 
 datapoints each, and analysis was OK except for some memory errors in a 
 script that used permutation analysis; but it wasn't very fast.)
  
 Any recommendations are welcome!
  
 Marije Booman
 
 
 De inhoud van dit bericht is vertrouwelijk en alleen bestemd voor de 
 geadresseerde(n). Anderen dan de geadresseerde(n) mogen geen gebruik maken 
 van dit bericht, het niet openbaar maken of op enige wijze verspreiden of 
 vermenigvuldigen. Het UMCG kan niet aansprakelijk gesteld worden voor een 
 incomplete aankomst of vertraging van dit verzonden bericht.
 
 The contents of this message are confidential and only intended for the eyes 
 of the addressee(s). Others than the addressee(s) are not allowed to use this 
 message, to make it public or to distribute or multiply this message in any 
 way. The UMCG cannot be held responsible for incomplete reception or delay of 
 this transferred message.
 
   [[alternative HTML version deleted]]
 
 ___
 R-SIG-Mac mailing list
 R-SIG-Mac@stat.math.ethz.ch
 https://stat.ethz.ch/mailman/listinfo/r-sig-mac
-- 
Dan Putler
Sauder School of Business
University of British Columbia

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


Re: [R-SIG-Mac] Is R more heavy on memory or processor?

2009-03-24 Thread Simon Urbanek


On Mar 24, 2009, at 14:55 , Booman, M wrote:


Dear all,

I am going to purchase a Power Mac (a new one, with Nehalem  
processor) for my R-based microarray analyses. I use mainly  
Bioconductor packages, and a typical dataset would consist of 50  
microarrays with 40,000 datapoints each. To make the right choice of  
processor and memory, I have a few questions:




I don't use BioC [you may want to ask on the BioC list instead (or  
hopefully some BioC users will chip in)], so my recommendations may be  
based on slightly different problems.



- would the current version of R benefit from the 8 cores in the new  
Intel Xeon Nehalem 8-core Mac Pro? So would an 8-core 2.26GHz  
machine be better than a 4-core 2.93GHz?


Unfortunately I cannot comment on Nehalems, but in general with Xeons  
you do feel quite a difference in the clock speed, so I wouldn't trade  
2.93GHz for 2.26GHz regardless of the CPU generation. It is true that  
pre-Nehalem Mac Pros cannot feed 8 cores, so you want go for the new  
Mac Pros, but I wouldn't even think about the 2.26GHz option. Some  
benchmarks suggest that the 2.26 Nehalem can still compete favorably  
if a lot of memory/io is involved, but it was not very convincing and  
I cannot tell first hand.



Or can R only use one core (in which case the 4-core 2.93GHZ machine  
would be better)?




R can use multiple cores in many ways - through BLAS (default in R for  
Mac OS X), vector op parallelization (Luke's pnmath) or explicit  
parallelization such as forking (multicore) or parallel processes  
(snow). The amount of parallelization achievable depends heavily on  
your applications. I use routinely all cores, but then I'm usually  
modeling my problems that way.



- If R does not benefot from multiple cores yet, is there anything  
known about whether Snow Leopard might make a difference in this?




I cannot comment on ongoing work details due to DNA associated with  
Snow Leopard, but technically from the Apple announcements you can  
deduce that the only possible improvements directly related to R can  
be achieved in the implicit parallelization which is essentially the  
pnmath path. There is not much more you can do in R save for a re- 
write of the methods you want to deal with.


In fact, the hope is rather that the packages for R start using  
parallelization more effectively, but that's not something Snow  
Leopard alone can change.



- To determine if my first priority should be processor speed or  
RAM, on which does R rely more heavily?




In my line of work (which is not bioinf, though) RAM turned to be more  
important, because the drop off when you run out of memory is sudden  
and devastatingly huge. With CPUs you'll have to wait a bit longer,  
but the difference is directly proportional to the CPU speed you get,  
so it is never as bad as running out of wired RAM. (BTW: in general  
you don't want to buy RAM from Apple - as much as I like Apple, there  
are compatible RAM sets at a fraction of the cost of what Apple  
charges, especially for Mac Pros - but there is always the 1st  
generation issue *).



- The new chipset has 3 memory channels (forgive me if I word this  
wrong, as you may have noticed I am no computer tech) so it can read  
6Gb RAM faster than it can read 8Gb of RAM; so for a program that  
relies more on RAM speed than RAM quantity it is recommended to use  
6Gb instead of 8 for better performance (or any multiple of 3).  
Which is more important for R, RAM speed or RAM quantity?




6GB is very little RAM, so I don't think that's an option ;) - but  
yes, you should care about the size first. The channels and timings  
only define how you populate the slots. Note that the 4-core Nehalem  
has only 4 slots, so it's not very expandable - I'd definitely get a 8- 
core old one with 16GB RAM or more rather than something that can take  
only 8GB ...



(I am not sure if it helps to know, but previously I used a Powermac  
G5 quadcore (sadly I forgot which processor speed but it was the  
standard G5 quadcore) with 4 Gb RAM for datasets of 30-40  
microarrays of 18,000 datapoints each, and analysis was OK except  
for some memory errors in a script that used permutation analysis;  
but it wasn't very fast.)




I would keep an eye on the RAM expansibility - even if you buy less  
RAM now, a ceiling of 8GB is very low. It may turn out that larger  
DIMMs will become available, but 16GB for the future is not enough,  
either. As with all 1st generation products the prices will go down a  
lot over time, so you may plan to upgrade later. Another piece worth  
considering is that you can always update RAM easily, but CPU upgrade  
is much more difficult.


Cheers,
Simon

___
R-SIG-Mac mailing list
R-SIG-Mac@stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-mac


Re: [R-SIG-Mac] Is R more heavy on memory or processor?

2009-03-24 Thread Booman, M
Dear Simon, Steven and Dan,
 
Thank you very much for your replies. I will look for an 'old' 8-core so I'll 
have some memory left for a memory expansion (it's for work and there is only 
$4000 CAD in the budget at the moment), or if I can't find an old one I'll get 
a new 8-core with as much memory as I can afford right now and expand later. 
 
Cheers,
Marije



Van: Simon Urbanek [mailto:simon.urba...@r-project.org]
Verzonden: di 24-3-2009 21:01
Aan: Booman, M
CC: R-SIG-Mac
Onderwerp: Re: [R-SIG-Mac] Is R more heavy on memory or processor?




On Mar 24, 2009, at 14:55 , Booman, M wrote:

 Dear all,

 I am going to purchase a Power Mac (a new one, with Nehalem 
 processor) for my R-based microarray analyses. I use mainly 
 Bioconductor packages, and a typical dataset would consist of 50 
 microarrays with 40,000 datapoints each. To make the right choice of 
 processor and memory, I have a few questions:


I don't use BioC [you may want to ask on the BioC list instead (or 
hopefully some BioC users will chip in)], so my recommendations may be 
based on slightly different problems.


 - would the current version of R benefit from the 8 cores in the new 
 Intel Xeon Nehalem 8-core Mac Pro? So would an 8-core 2.26GHz 
 machine be better than a 4-core 2.93GHz?

Unfortunately I cannot comment on Nehalems, but in general with Xeons 
you do feel quite a difference in the clock speed, so I wouldn't trade 
2.93GHz for 2.26GHz regardless of the CPU generation. It is true that 
pre-Nehalem Mac Pros cannot feed 8 cores, so you want go for the new 
Mac Pros, but I wouldn't even think about the 2.26GHz option. Some 
benchmarks suggest that the 2.26 Nehalem can still compete favorably 
if a lot of memory/io is involved, but it was not very convincing and 
I cannot tell first hand.


 Or can R only use one core (in which case the 4-core 2.93GHZ machine 
 would be better)?


R can use multiple cores in many ways - through BLAS (default in R for 
Mac OS X), vector op parallelization (Luke's pnmath) or explicit 
parallelization such as forking (multicore) or parallel processes 
(snow). The amount of parallelization achievable depends heavily on 
your applications. I use routinely all cores, but then I'm usually 
modeling my problems that way.


 - If R does not benefot from multiple cores yet, is there anything 
 known about whether Snow Leopard might make a difference in this?


I cannot comment on ongoing work details due to DNA associated with 
Snow Leopard, but technically from the Apple announcements you can 
deduce that the only possible improvements directly related to R can 
be achieved in the implicit parallelization which is essentially the 
pnmath path. There is not much more you can do in R save for a re-
write of the methods you want to deal with.

In fact, the hope is rather that the packages for R start using 
parallelization more effectively, but that's not something Snow 
Leopard alone can change.


 - To determine if my first priority should be processor speed or 
 RAM, on which does R rely more heavily?


In my line of work (which is not bioinf, though) RAM turned to be more 
important, because the drop off when you run out of memory is sudden 
and devastatingly huge. With CPUs you'll have to wait a bit longer, 
but the difference is directly proportional to the CPU speed you get, 
so it is never as bad as running out of wired RAM. (BTW: in general 
you don't want to buy RAM from Apple - as much as I like Apple, there 
are compatible RAM sets at a fraction of the cost of what Apple 
charges, especially for Mac Pros - but there is always the 1st 
generation issue *).


 - The new chipset has 3 memory channels (forgive me if I word this 
 wrong, as you may have noticed I am no computer tech) so it can read 
 6Gb RAM faster than it can read 8Gb of RAM; so for a program that 
 relies more on RAM speed than RAM quantity it is recommended to use 
 6Gb instead of 8 for better performance (or any multiple of 3). 
 Which is more important for R, RAM speed or RAM quantity?


6GB is very little RAM, so I don't think that's an option ;) - but 
yes, you should care about the size first. The channels and timings 
only define how you populate the slots. Note that the 4-core Nehalem 
has only 4 slots, so it's not very expandable - I'd definitely get a 8-
core old one with 16GB RAM or more rather than something that can take 
only 8GB ...


 (I am not sure if it helps to know, but previously I used a Powermac 
 G5 quadcore (sadly I forgot which processor speed but it was the 
 standard G5 quadcore) with 4 Gb RAM for datasets of 30-40 
 microarrays of 18,000 datapoints each, and analysis was OK except 
 for some memory errors in a script that used permutation analysis; 
 but it wasn't very fast.)


I would keep an eye on the RAM expansibility - even if you buy less 
RAM now, a ceiling of 8GB is very low. It may turn out that larger 
DIMMs will become available, but 16GB for the future is