Re: [R] What is the most cost effective hardware for R?

2012-05-10 Thread Hugh Morgan
Thank you all for the help.  We have decided against using for example 
Amazon cloud for basicly paperwork issues.  We have money available now 
for buying kit, this may not be available for buying services, and may 
not be available next year, or the next.  We shall certainly consider it 
as a fall back at times of high load.


We are looking at the Dell poweredge M915.  It has 64 cores and we are 
getting it with 256 GB memory, and it really not that expensive.  I am 
surprised what power you can get these days for not very much money.


Thanks again.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread John Laing
For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
of computing time. You can buy time from Amazon at roughly $0.08 /
core / hour, so it would cost about $7 to run your analyses in the
cloud. Assuming complete parallelization you could fire up as many
machines as you need to get the work done in as little time as you
want, with the same fixed cost. I think that's a pretty compelling
argument, compared to the hassles of buying and maintaining hardware,
power supply, air conditioning, etc.

John

On Tue, May 8, 2012 at 1:12 PM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote:
 On 05/08/2012 06:02 PM, Rich Shepard wrote:

 On Tue, 8 May 2012, Hugh Morgan wrote:

 Perhaps I have confused the issue. When I initially said data points I
 meant one stand alone analysis, not one piece of data. Each analysis
 point
 takes 1.5 seconds. I have not implemented running this over the whole
 dataset yet, but I would expect it to take about 5 to 10 hours. This is
 just about acceptable, but it would be better if this was quicker. As I
 say, the exact analysis method has not yet been determined, and if that
 was significantly more computationally intensive then that could be an
 issue.


  If I had to do what you write above, I would separate the data into
 chunks; one for each core/CPU in my system. Then I would invoke R to run
 on
 each core/CPU and have that instance process one data set. With sufficient
 memory for each core/CPU the processing will occur in parallel and cut the
 overall time by the number of instances running.

  You might want to turn up the air conditioning around the system 'cause
 that CPU is going to be working hard.


 That is roughly how I am working on getting it running currently, and the 5
 hour estimate assumes that is perfectly parallelisable.

 We have a server room with a reasonable air con.  I have only just thought
 about adding the extra cooling to the total cost, but I suspect that that
 will come from a different budget so may not matter so much.  I shall
 include it in the quote until told to do otherwise.



 Rich

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 This email may have a PROTECTIVE MARKING, for an explanation please see:
 http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread Barry Rowlingson
On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote:
 For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
 of computing time. You can buy time from Amazon at roughly $0.08 /
 core / hour, so it would cost about $7 to run your analyses in the
 cloud. Assuming complete parallelization you could fire up as many
 machines as you need to get the work done in as little time as you
 want, with the same fixed cost. I think that's a pretty compelling
 argument, compared to the hassles of buying and maintaining hardware,
 power supply, air conditioning, etc.

 Noticing Hugh's .ac.uk email address you do have to factor in the
hassle of getting something as nebulous as cloud computing past the
red tape. How much will it cost? says the bureaucrat. Depends how
much CPU time I need, says the academic. So potentially, what's the
most? says the bureaucrat. Millions,, says the academic, honestly,
adding but that would only be if my job scheduling went a bit mad and
grabbed a few thousand Amazon cores and thrashed them for weeks
without me noticing. Okay, says the bureaucrat, now, can we send
Amazon a purchase order so that Amazon send us an invoice for this
unknown and potentially unpredictable cost first?. Oh no, says the
academic, we need a credit card

Maybe there are other ways of paying for Amazon cloud CPUs, I've not
investigated. Anyone in academia happily crunching on EC2?

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread Whit Armstrong
I don't work for Amazon, but here is one of their promo pieces on
using 'spot' instances:
http://youtu.be/WD9N73F3Fao

at about 2:15, they cite University of Melbourne and Universitat de
Barcelona as customers...

My interest in all this cloud talk is that I'll be presenting a
tutorial on R in the cloud at R/Finance.
http://www.rinfinance.com/agenda/

It's really easy to use R in the cloud, even if you don't want to move
your data into s3.

-Whit



On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson
b.rowling...@lancaster.ac.uk wrote:
 On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote:
 For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
 of computing time. You can buy time from Amazon at roughly $0.08 /
 core / hour, so it would cost about $7 to run your analyses in the
 cloud. Assuming complete parallelization you could fire up as many
 machines as you need to get the work done in as little time as you
 want, with the same fixed cost. I think that's a pretty compelling
 argument, compared to the hassles of buying and maintaining hardware,
 power supply, air conditioning, etc.

  Noticing Hugh's .ac.uk email address you do have to factor in the
 hassle of getting something as nebulous as cloud computing past the
 red tape. How much will it cost? says the bureaucrat. Depends how
 much CPU time I need, says the academic. So potentially, what's the
 most? says the bureaucrat. Millions,, says the academic, honestly,
 adding but that would only be if my job scheduling went a bit mad and
 grabbed a few thousand Amazon cores and thrashed them for weeks
 without me noticing. Okay, says the bureaucrat, now, can we send
 Amazon a purchase order so that Amazon send us an invoice for this
 unknown and potentially unpredictable cost first?. Oh no, says the
 academic, we need a credit card

 Maybe there are other ways of paying for Amazon cloud CPUs, I've not
 investigated. Anyone in academia happily crunching on EC2?

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread Michael Sumner
Barry, *fortunes* are very auspicious but you are already well represented.

Cheers, Mike.

On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong
armstrong.w...@gmail.com wrote:
 I don't work for Amazon, but here is one of their promo pieces on
 using 'spot' instances:
 http://youtu.be/WD9N73F3Fao

 at about 2:15, they cite University of Melbourne and Universitat de
 Barcelona as customers...

 My interest in all this cloud talk is that I'll be presenting a
 tutorial on R in the cloud at R/Finance.
 http://www.rinfinance.com/agenda/

 It's really easy to use R in the cloud, even if you don't want to move
 your data into s3.

 -Whit



 On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson
 b.rowling...@lancaster.ac.uk wrote:
 On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote:
 For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
 of computing time. You can buy time from Amazon at roughly $0.08 /
 core / hour, so it would cost about $7 to run your analyses in the
 cloud. Assuming complete parallelization you could fire up as many
 machines as you need to get the work done in as little time as you
 want, with the same fixed cost. I think that's a pretty compelling
 argument, compared to the hassles of buying and maintaining hardware,
 power supply, air conditioning, etc.

  Noticing Hugh's .ac.uk email address you do have to factor in the
 hassle of getting something as nebulous as cloud computing past the
 red tape. How much will it cost? says the bureaucrat. Depends how
 much CPU time I need, says the academic. So potentially, what's the
 most? says the bureaucrat. Millions,, says the academic, honestly,
 adding but that would only be if my job scheduling went a bit mad and
 grabbed a few thousand Amazon cores and thrashed them for weeks
 without me noticing. Okay, says the bureaucrat, now, can we send
 Amazon a purchase order so that Amazon send us an invoice for this
 unknown and potentially unpredictable cost first?. Oh no, says the
 academic, we need a credit card

 Maybe there are other ways of paying for Amazon cloud CPUs, I've not
 investigated. Anyone in academia happily crunching on EC2?

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsum...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread peter dalgaard

On May 9, 2012, at 17:46 , Michael Sumner wrote:

 Barry, *fortunes* are very auspicious but you are already well represented.

..as nebulous as cloud computing.., indeed!

 Cheers, Mike.
 
 On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong
 armstrong.w...@gmail.com wrote:
 I don't work for Amazon, but here is one of their promo pieces on
 using 'spot' instances:
 http://youtu.be/WD9N73F3Fao
 
 at about 2:15, they cite University of Melbourne and Universitat de
 Barcelona as customers...
 
 My interest in all this cloud talk is that I'll be presenting a
 tutorial on R in the cloud at R/Finance.
 http://www.rinfinance.com/agenda/
 
 It's really easy to use R in the cloud, even if you don't want to move
 your data into s3.
 
 -Whit
 
 
 
 On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson
 b.rowling...@lancaster.ac.uk wrote:
 On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote:
 For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
 of computing time. You can buy time from Amazon at roughly $0.08 /
 core / hour, so it would cost about $7 to run your analyses in the
 cloud. Assuming complete parallelization you could fire up as many
 machines as you need to get the work done in as little time as you
 want, with the same fixed cost. I think that's a pretty compelling
 argument, compared to the hassles of buying and maintaining hardware,
 power supply, air conditioning, etc.
 
  Noticing Hugh's .ac.uk email address you do have to factor in the
 hassle of getting something as nebulous as cloud computing past the
 red tape. How much will it cost? says the bureaucrat. Depends how
 much CPU time I need, says the academic. So potentially, what's the
 most? says the bureaucrat. Millions,, says the academic, honestly,
 adding but that would only be if my job scheduling went a bit mad and
 grabbed a few thousand Amazon cores and thrashed them for weeks
 without me noticing. Okay, says the bureaucrat, now, can we send
 Amazon a purchase order so that Amazon send us an invoice for this
 unknown and potentially unpredictable cost first?. Oh no, says the
 academic, we need a credit card
 
 Maybe there are other ways of paying for Amazon cloud CPUs, I've not
 investigated. Anyone in academia happily crunching on EC2?
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
 
 -- 
 Michael Sumner
 Institute for Marine and Antarctic Studies, University of Tasmania
 Hobart, Australia
 e-mail: mdsum...@gmail.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-09 Thread Michael Sumner
It's not water vapour: http://www.youtube.com/watch?v=rg12qNRgSag

On Thu, May 10, 2012 at 2:20 AM, peter dalgaard pda...@gmail.com wrote:

 On May 9, 2012, at 17:46 , Michael Sumner wrote:

 Barry, *fortunes* are very auspicious but you are already well represented.

 ..as nebulous as cloud computing.., indeed!

 Cheers, Mike.

 On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong
 armstrong.w...@gmail.com wrote:
 I don't work for Amazon, but here is one of their promo pieces on
 using 'spot' instances:
 http://youtu.be/WD9N73F3Fao

 at about 2:15, they cite University of Melbourne and Universitat de
 Barcelona as customers...

 My interest in all this cloud talk is that I'll be presenting a
 tutorial on R in the cloud at R/Finance.
 http://www.rinfinance.com/agenda/

 It's really easy to use R in the cloud, even if you don't want to move
 your data into s3.

 -Whit



 On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson
 b.rowling...@lancaster.ac.uk wrote:
 On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote:
 For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours
 of computing time. You can buy time from Amazon at roughly $0.08 /
 core / hour, so it would cost about $7 to run your analyses in the
 cloud. Assuming complete parallelization you could fire up as many
 machines as you need to get the work done in as little time as you
 want, with the same fixed cost. I think that's a pretty compelling
 argument, compared to the hassles of buying and maintaining hardware,
 power supply, air conditioning, etc.

  Noticing Hugh's .ac.uk email address you do have to factor in the
 hassle of getting something as nebulous as cloud computing past the
 red tape. How much will it cost? says the bureaucrat. Depends how
 much CPU time I need, says the academic. So potentially, what's the
 most? says the bureaucrat. Millions,, says the academic, honestly,
 adding but that would only be if my job scheduling went a bit mad and
 grabbed a few thousand Amazon cores and thrashed them for weeks
 without me noticing. Okay, says the bureaucrat, now, can we send
 Amazon a purchase order so that Amazon send us an invoice for this
 unknown and potentially unpredictable cost first?. Oh no, says the
 academic, we need a credit card

 Maybe there are other ways of paying for Amazon cloud CPUs, I've not
 investigated. Anyone in academia happily crunching on EC2?

 Barry

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --
 Michael Sumner
 Institute for Marine and Antarctic Studies, University of Tasmania
 Hobart, Australia
 e-mail: mdsum...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 --
 Peter Dalgaard, Professor,
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com











-- 
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsum...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What is the most cost effective hardware for R?

2012-05-08 Thread Hugh Morgan
Has anyone got any advice about what hardware to buy to run lots of R 
analysis?  Links to studies or other documents would be great as would 
be personal opinion.


We are not currently certain what analysis we shall be running, but our 
first implementation uses the functions lme and gls from the library 
nlme.  To do one data point currently takes 1.5 seconds on our 3 year 
old sunfire box, and the data points are completely independant so the 
analysis is fully parallelisable without implmenting multi-threading 
within each data point.  We have a reasnoble amount of sys admin support 
in house.  We are an academic institution.  We are looking at spending a 
few thousand to a small number of tens of thousands of dollars.


Any help greatly appreciated


This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Zhou Fang
How many data points do you have?

--
View this message in context: 
http://r.789695.n4.nabble.com/What-is-the-most-cost-effective-hardware-for-R-tp4617155p4617187.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread R. Michael Weylandt
I think the general experience is that R is going to be more
memory-hungry than other resources so you'll get the best bang for
your buck on that end. R also has good parallelization support: that
and other high performance concerns are addressed here:

http://cran.r-project.org/web/views/HighPerformanceComputing.html

Performance (as it is for most computationally expensive tasks) will
likely be better under Linux and you'll get good free help from
R-SIG-Fedora and R-SIG-Debian if you pick one of those (in addition to
whatever your sys admin can give)

Michael

On Tue, May 8, 2012 at 6:49 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote:
 Has anyone got any advice about what hardware to buy to run lots of R
 analysis?  Links to studies or other documents would be great as would be
 personal opinion.

 We are not currently certain what analysis we shall be running, but our
 first implementation uses the functions lme and gls from the library nlme.
  To do one data point currently takes 1.5 seconds on our 3 year old sunfire
 box, and the data points are completely independant so the analysis is fully
 parallelisable without implmenting multi-threading within each data point.
  We have a reasnoble amount of sys admin support in house.  We are an
 academic institution.  We are looking at spending a few thousand to a small
 number of tens of thousands of dollars.

 Any help greatly appreciated


 This email may have a PROTECTIVE MARKING, for an explanation please see:
 http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Barry Rowlingson
On Tue, May 8, 2012 at 11:49 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote:
 Has anyone got any advice about what hardware to buy to run lots of R
 analysis?  Links to studies or other documents would be great as would be
 personal opinion.

 We are not currently certain what analysis we shall be running, but our
 first implementation uses the functions lme and gls from the library nlme.
  To do one data point currently takes 1.5 seconds on our 3 year old sunfire
 box, and the data points are completely independant so the analysis is fully
 parallelisable without implmenting multi-threading within each data point.
  We have a reasnoble amount of sys admin support in house.  We are an
 academic institution.  We are looking at spending a few thousand to a small
 number of tens of thousands of dollars.

 Any help greatly appreciated

 Why buy when you can rent? Unless your hardware is going to be
running 24/7 doing these analyses then you are paying for it to sit
idle. You might be better off purchasing computing time from Amazon or
another cloud computing provider. If you need to run more analyses
quickly, just buy some more virtual hosts.

Also saves on needing to run a data center, hardware warranty costs,
disposing 10U of rack-mounted obsolete hardware after five years etc.
Obviously the rental cost includes all these things as expenses of the
cloud computing provider, but they have massive economies of scale.

 I've not gone this way yet for any projects I've been involved with,
but its becoming more of a possibility with every grant award we
get...

Barry

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Hugh Morgan

On 05/08/2012 12:14 PM, Zhou Fang wrote:

How many data points do you have?



Currently 200,000.  We are likely to have 10 times that in 5 years.


  Why buy when you can rent? Unless your hardware is going to be
running 24/7 doing these analyses then you are paying for it to sit
idle. You might be better off purchasing computing time from Amazon or
another cloud computing provider. If you need to run more analyses
quickly, just buy some more virtual hosts.


Because of the nature of the funding we are likely to be better off 
buying.  We are likely to be running most of the time, most of the 
analysis must be rerun as more data becomes available, and that is 
likely to happen a few times every week.


Thank you for all the pointers, we shall consider them all.


This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Whit Armstrong
You should think about the cloud as a serious alternative.

I completely agree with Barry.  Unless you will utilize your machines
(and by utilize, I mean 100% cpu usage) all the time (including
weekends) you will probably better use your funds to purchase blocks
of machines when you need to run your sim, and turn them off
afterwards.

There are some new packages that make it very easy to access the cloud
from a local R session (in an lapply like way).  Happy to point those
out to you if you are interested...

-Whit



On Tue, May 8, 2012 at 11:50 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote:
 On 05/08/2012 12:14 PM, Zhou Fang wrote:

 How many data points do you have?


 Currently 200,000.  We are likely to have 10 times that in 5 years.


  Why buy when you can rent? Unless your hardware is going to be
 running 24/7 doing these analyses then you are paying for it to sit
 idle. You might be better off purchasing computing time from Amazon or
 another cloud computing provider. If you need to run more analyses
 quickly, just buy some more virtual hosts.


 Because of the nature of the funding we are likely to be better off buying.
  We are likely to be running most of the time, most of the analysis must be
 rerun as more data becomes available, and that is likely to happen a few
 times every week.

 Thank you for all the pointers, we shall consider them all.



 This email may have a PROTECTIVE MARKING, for an explanation please see:
 http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Bert Gunter
Probably just pointing out the obvious, but:

200,000 data points may not be that many these days, depending on the
dimensionality of the data. Nor is 10 times that number, neither now
nor in 5 years, again depending on data dimensionality. So my question
is, have you actually tried running your simulations -- or a
reasonable approximation thereof -- on a single cheap machine? It
might be that your concerns are overblown, especially with multicore
and parallelization.

Obviously, ignore if you've already done this and know it's nonsense.

Cheers,
Bert

On Tue, May 8, 2012 at 8:50 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote:
 On 05/08/2012 12:14 PM, Zhou Fang wrote:

 How many data points do you have?


 Currently 200,000.  We are likely to have 10 times that in 5 years.

  Why buy when you can rent? Unless your hardware is going to be
 running 24/7 doing these analyses then you are paying for it to sit
 idle. You might be better off purchasing computing time from Amazon or
 another cloud computing provider. If you need to run more analyses
 quickly, just buy some more virtual hosts.


 Because of the nature of the funding we are likely to be better off buying.
  We are likely to be running most of the time, most of the analysis must be
 rerun as more data becomes available, and that is likely to happen a few
 times every week.

 Thank you for all the pointers, we shall consider them all.


 This email may have a PROTECTIVE MARKING, for an explanation please see:
 http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Hugh Morgan
Perhaps I have confused the issue.  When I initally said data points I 
meant one stand alone analysis, not one piece of data.  Each analysis 
point takes 1.5 seconds.  I have not implemented running this over the 
whole dataset yet, but I would expect it to take about 5 to 10 hours.  
This is just about acceptable, but it would be better if this was 
quicker.  As I say, the exact analysis method has not yet been 
determined, and if that was significantly more computationally intensive 
then that could be an issue.


It is not actually a simulation, it is a pre-analysis of the dataset 
before public display.  I do have a simulation of the analysis to run, 
and that could be some orders of magnitude larger than the real 
dataset.  I can of course wait for that.


Thanks for the input.

On 05/08/2012 05:24 PM, Bert Gunter wrote:

Probably just pointing out the obvious, but:

200,000 data points may not be that many these days, depending on the
dimensionality of the data. Nor is 10 times that number, neither now
nor in 5 years, again depending on data dimensionality. So my question
is, have you actually tried running your simulations -- or a
reasonable approximation thereof -- on a single cheap machine? It
might be that your concerns are overblown, especially with multicore
and parallelization.

Obviously, ignore if you've already done this and know it's nonsense.

Cheers,
Bert

On Tue, May 8, 2012 at 8:50 AM, Hugh Morganh.mor...@har.mrc.ac.uk  wrote:

On 05/08/2012 12:14 PM, Zhou Fang wrote:

How many data points do you have?


Currently 200,000.  We are likely to have 10 times that in 5 years.


  Why buy when you can rent? Unless your hardware is going to be
running 24/7 doing these analyses then you are paying for it to sit
idle. You might be better off purchasing computing time from Amazon or
another cloud computing provider. If you need to run more analyses
quickly, just buy some more virtual hosts.


Because of the nature of the funding we are likely to be better off buying.
  We are likely to be running most of the time, most of the analysis must be
rerun as more data becomes available, and that is likely to happen a few
times every week.

Thank you for all the pointers, we shall consider them all.


This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Rich Shepard

On Tue, 8 May 2012, Hugh Morgan wrote:


Perhaps I have confused the issue. When I initially said data points I
meant one stand alone analysis, not one piece of data. Each analysis point
takes 1.5 seconds. I have not implemented running this over the whole
dataset yet, but I would expect it to take about 5 to 10 hours. This is
just about acceptable, but it would be better if this was quicker. As I
say, the exact analysis method has not yet been determined, and if that
was significantly more computationally intensive then that could be an
issue.


  If I had to do what you write above, I would separate the data into
chunks; one for each core/CPU in my system. Then I would invoke R to run on
each core/CPU and have that instance process one data set. With sufficient
memory for each core/CPU the processing will occur in parallel and cut the
overall time by the number of instances running.

  You might want to turn up the air conditioning around the system 'cause
that CPU is going to be working hard.

Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What is the most cost effective hardware for R?

2012-05-08 Thread Hugh Morgan

On 05/08/2012 06:02 PM, Rich Shepard wrote:

On Tue, 8 May 2012, Hugh Morgan wrote:


Perhaps I have confused the issue. When I initially said data points I
meant one stand alone analysis, not one piece of data. Each analysis 
point

takes 1.5 seconds. I have not implemented running this over the whole
dataset yet, but I would expect it to take about 5 to 10 hours. This is
just about acceptable, but it would be better if this was quicker. As I
say, the exact analysis method has not yet been determined, and if that
was significantly more computationally intensive then that could be an
issue.


  If I had to do what you write above, I would separate the data into
chunks; one for each core/CPU in my system. Then I would invoke R to 
run on
each core/CPU and have that instance process one data set. With 
sufficient
memory for each core/CPU the processing will occur in parallel and cut 
the

overall time by the number of instances running.

  You might want to turn up the air conditioning around the system 'cause
that CPU is going to be working hard.


That is roughly how I am working on getting it running currently, and 
the 5 hour estimate assumes that is perfectly parallelisable.


We have a server room with a reasonable air con.  I have only just 
thought about adding the extra cooling to the total cost, but I suspect 
that that will come from a different budget so may not matter so much.  
I shall include it in the quote until told to do otherwise.




Rich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



This email may have a PROTECTIVE MARKING, for an explanation please see:
http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.