Re: [R] What is the most cost effective hardware for R?
Thank you all for the help. We have decided against using for example Amazon cloud for basicly paperwork issues. We have money available now for buying kit, this may not be available for buying services, and may not be available next year, or the next. We shall certainly consider it as a fall back at times of high load. We are looking at the Dell poweredge M915. It has 64 cores and we are getting it with 256 GB memory, and it really not that expensive. I am surprised what power you can get these days for not very much money. Thanks again. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. John On Tue, May 8, 2012 at 1:12 PM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote: On 05/08/2012 06:02 PM, Rich Shepard wrote: On Tue, 8 May 2012, Hugh Morgan wrote: Perhaps I have confused the issue. When I initially said data points I meant one stand alone analysis, not one piece of data. Each analysis point takes 1.5 seconds. I have not implemented running this over the whole dataset yet, but I would expect it to take about 5 to 10 hours. This is just about acceptable, but it would be better if this was quicker. As I say, the exact analysis method has not yet been determined, and if that was significantly more computationally intensive then that could be an issue. If I had to do what you write above, I would separate the data into chunks; one for each core/CPU in my system. Then I would invoke R to run on each core/CPU and have that instance process one data set. With sufficient memory for each core/CPU the processing will occur in parallel and cut the overall time by the number of instances running. You might want to turn up the air conditioning around the system 'cause that CPU is going to be working hard. That is roughly how I am working on getting it running currently, and the 5 hour estimate assumes that is perfectly parallelisable. We have a server room with a reasonable air con. I have only just thought about adding the extra cooling to the total cost, but I suspect that that will come from a different budget so may not matter so much. I shall include it in the quote until told to do otherwise. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote: For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. How much will it cost? says the bureaucrat. Depends how much CPU time I need, says the academic. So potentially, what's the most? says the bureaucrat. Millions,, says the academic, honestly, adding but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing. Okay, says the bureaucrat, now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?. Oh no, says the academic, we need a credit card Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
I don't work for Amazon, but here is one of their promo pieces on using 'spot' instances: http://youtu.be/WD9N73F3Fao at about 2:15, they cite University of Melbourne and Universitat de Barcelona as customers... My interest in all this cloud talk is that I'll be presenting a tutorial on R in the cloud at R/Finance. http://www.rinfinance.com/agenda/ It's really easy to use R in the cloud, even if you don't want to move your data into s3. -Whit On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote: For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. How much will it cost? says the bureaucrat. Depends how much CPU time I need, says the academic. So potentially, what's the most? says the bureaucrat. Millions,, says the academic, honestly, adding but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing. Okay, says the bureaucrat, now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?. Oh no, says the academic, we need a credit card Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
Barry, *fortunes* are very auspicious but you are already well represented. Cheers, Mike. On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong armstrong.w...@gmail.com wrote: I don't work for Amazon, but here is one of their promo pieces on using 'spot' instances: http://youtu.be/WD9N73F3Fao at about 2:15, they cite University of Melbourne and Universitat de Barcelona as customers... My interest in all this cloud talk is that I'll be presenting a tutorial on R in the cloud at R/Finance. http://www.rinfinance.com/agenda/ It's really easy to use R in the cloud, even if you don't want to move your data into s3. -Whit On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote: For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. How much will it cost? says the bureaucrat. Depends how much CPU time I need, says the academic. So potentially, what's the most? says the bureaucrat. Millions,, says the academic, honestly, adding but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing. Okay, says the bureaucrat, now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?. Oh no, says the academic, we need a credit card Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On May 9, 2012, at 17:46 , Michael Sumner wrote: Barry, *fortunes* are very auspicious but you are already well represented. ..as nebulous as cloud computing.., indeed! Cheers, Mike. On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong armstrong.w...@gmail.com wrote: I don't work for Amazon, but here is one of their promo pieces on using 'spot' instances: http://youtu.be/WD9N73F3Fao at about 2:15, they cite University of Melbourne and Universitat de Barcelona as customers... My interest in all this cloud talk is that I'll be presenting a tutorial on R in the cloud at R/Finance. http://www.rinfinance.com/agenda/ It's really easy to use R in the cloud, even if you don't want to move your data into s3. -Whit On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote: For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. How much will it cost? says the bureaucrat. Depends how much CPU time I need, says the academic. So potentially, what's the most? says the bureaucrat. Millions,, says the academic, honestly, adding but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing. Okay, says the bureaucrat, now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?. Oh no, says the academic, we need a credit card Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
It's not water vapour: http://www.youtube.com/watch?v=rg12qNRgSag On Thu, May 10, 2012 at 2:20 AM, peter dalgaard pda...@gmail.com wrote: On May 9, 2012, at 17:46 , Michael Sumner wrote: Barry, *fortunes* are very auspicious but you are already well represented. ..as nebulous as cloud computing.., indeed! Cheers, Mike. On Thu, May 10, 2012 at 1:38 AM, Whit Armstrong armstrong.w...@gmail.com wrote: I don't work for Amazon, but here is one of their promo pieces on using 'spot' instances: http://youtu.be/WD9N73F3Fao at about 2:15, they cite University of Melbourne and Universitat de Barcelona as customers... My interest in all this cloud talk is that I'll be presenting a tutorial on R in the cloud at R/Finance. http://www.rinfinance.com/agenda/ It's really easy to use R in the cloud, even if you don't want to move your data into s3. -Whit On Wed, May 9, 2012 at 9:36 AM, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote: On Wed, May 9, 2012 at 2:22 PM, John Laing john.la...@gmail.com wrote: For 200,000 analyses at 1.5 seconds each, you're looking at ~83 hours of computing time. You can buy time from Amazon at roughly $0.08 / core / hour, so it would cost about $7 to run your analyses in the cloud. Assuming complete parallelization you could fire up as many machines as you need to get the work done in as little time as you want, with the same fixed cost. I think that's a pretty compelling argument, compared to the hassles of buying and maintaining hardware, power supply, air conditioning, etc. Noticing Hugh's .ac.uk email address you do have to factor in the hassle of getting something as nebulous as cloud computing past the red tape. How much will it cost? says the bureaucrat. Depends how much CPU time I need, says the academic. So potentially, what's the most? says the bureaucrat. Millions,, says the academic, honestly, adding but that would only be if my job scheduling went a bit mad and grabbed a few thousand Amazon cores and thrashed them for weeks without me noticing. Okay, says the bureaucrat, now, can we send Amazon a purchase order so that Amazon send us an invoice for this unknown and potentially unpredictable cost first?. Oh no, says the academic, we need a credit card Maybe there are other ways of paying for Amazon cloud CPUs, I've not investigated. Anyone in academia happily crunching on EC2? Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com -- Michael Sumner Institute for Marine and Antarctic Studies, University of Tasmania Hobart, Australia e-mail: mdsum...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] What is the most cost effective hardware for R?
Has anyone got any advice about what hardware to buy to run lots of R analysis? Links to studies or other documents would be great as would be personal opinion. We are not currently certain what analysis we shall be running, but our first implementation uses the functions lme and gls from the library nlme. To do one data point currently takes 1.5 seconds on our 3 year old sunfire box, and the data points are completely independant so the analysis is fully parallelisable without implmenting multi-threading within each data point. We have a reasnoble amount of sys admin support in house. We are an academic institution. We are looking at spending a few thousand to a small number of tens of thousands of dollars. Any help greatly appreciated This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
How many data points do you have? -- View this message in context: http://r.789695.n4.nabble.com/What-is-the-most-cost-effective-hardware-for-R-tp4617155p4617187.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
I think the general experience is that R is going to be more memory-hungry than other resources so you'll get the best bang for your buck on that end. R also has good parallelization support: that and other high performance concerns are addressed here: http://cran.r-project.org/web/views/HighPerformanceComputing.html Performance (as it is for most computationally expensive tasks) will likely be better under Linux and you'll get good free help from R-SIG-Fedora and R-SIG-Debian if you pick one of those (in addition to whatever your sys admin can give) Michael On Tue, May 8, 2012 at 6:49 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote: Has anyone got any advice about what hardware to buy to run lots of R analysis? Links to studies or other documents would be great as would be personal opinion. We are not currently certain what analysis we shall be running, but our first implementation uses the functions lme and gls from the library nlme. To do one data point currently takes 1.5 seconds on our 3 year old sunfire box, and the data points are completely independant so the analysis is fully parallelisable without implmenting multi-threading within each data point. We have a reasnoble amount of sys admin support in house. We are an academic institution. We are looking at spending a few thousand to a small number of tens of thousands of dollars. Any help greatly appreciated This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On Tue, May 8, 2012 at 11:49 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote: Has anyone got any advice about what hardware to buy to run lots of R analysis? Links to studies or other documents would be great as would be personal opinion. We are not currently certain what analysis we shall be running, but our first implementation uses the functions lme and gls from the library nlme. To do one data point currently takes 1.5 seconds on our 3 year old sunfire box, and the data points are completely independant so the analysis is fully parallelisable without implmenting multi-threading within each data point. We have a reasnoble amount of sys admin support in house. We are an academic institution. We are looking at spending a few thousand to a small number of tens of thousands of dollars. Any help greatly appreciated Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Also saves on needing to run a data center, hardware warranty costs, disposing 10U of rack-mounted obsolete hardware after five years etc. Obviously the rental cost includes all these things as expenses of the cloud computing provider, but they have massive economies of scale. I've not gone this way yet for any projects I've been involved with, but its becoming more of a possibility with every grant award we get... Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On 05/08/2012 12:14 PM, Zhou Fang wrote: How many data points do you have? Currently 200,000. We are likely to have 10 times that in 5 years. Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
You should think about the cloud as a serious alternative. I completely agree with Barry. Unless you will utilize your machines (and by utilize, I mean 100% cpu usage) all the time (including weekends) you will probably better use your funds to purchase blocks of machines when you need to run your sim, and turn them off afterwards. There are some new packages that make it very easy to access the cloud from a local R session (in an lapply like way). Happy to point those out to you if you are interested... -Whit On Tue, May 8, 2012 at 11:50 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote: On 05/08/2012 12:14 PM, Zhou Fang wrote: How many data points do you have? Currently 200,000. We are likely to have 10 times that in 5 years. Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
Probably just pointing out the obvious, but: 200,000 data points may not be that many these days, depending on the dimensionality of the data. Nor is 10 times that number, neither now nor in 5 years, again depending on data dimensionality. So my question is, have you actually tried running your simulations -- or a reasonable approximation thereof -- on a single cheap machine? It might be that your concerns are overblown, especially with multicore and parallelization. Obviously, ignore if you've already done this and know it's nonsense. Cheers, Bert On Tue, May 8, 2012 at 8:50 AM, Hugh Morgan h.mor...@har.mrc.ac.uk wrote: On 05/08/2012 12:14 PM, Zhou Fang wrote: How many data points do you have? Currently 200,000. We are likely to have 10 times that in 5 years. Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
Perhaps I have confused the issue. When I initally said data points I meant one stand alone analysis, not one piece of data. Each analysis point takes 1.5 seconds. I have not implemented running this over the whole dataset yet, but I would expect it to take about 5 to 10 hours. This is just about acceptable, but it would be better if this was quicker. As I say, the exact analysis method has not yet been determined, and if that was significantly more computationally intensive then that could be an issue. It is not actually a simulation, it is a pre-analysis of the dataset before public display. I do have a simulation of the analysis to run, and that could be some orders of magnitude larger than the real dataset. I can of course wait for that. Thanks for the input. On 05/08/2012 05:24 PM, Bert Gunter wrote: Probably just pointing out the obvious, but: 200,000 data points may not be that many these days, depending on the dimensionality of the data. Nor is 10 times that number, neither now nor in 5 years, again depending on data dimensionality. So my question is, have you actually tried running your simulations -- or a reasonable approximation thereof -- on a single cheap machine? It might be that your concerns are overblown, especially with multicore and parallelization. Obviously, ignore if you've already done this and know it's nonsense. Cheers, Bert On Tue, May 8, 2012 at 8:50 AM, Hugh Morganh.mor...@har.mrc.ac.uk wrote: On 05/08/2012 12:14 PM, Zhou Fang wrote: How many data points do you have? Currently 200,000. We are likely to have 10 times that in 5 years. Why buy when you can rent? Unless your hardware is going to be running 24/7 doing these analyses then you are paying for it to sit idle. You might be better off purchasing computing time from Amazon or another cloud computing provider. If you need to run more analyses quickly, just buy some more virtual hosts. Because of the nature of the funding we are likely to be better off buying. We are likely to be running most of the time, most of the analysis must be rerun as more data becomes available, and that is likely to happen a few times every week. Thank you for all the pointers, we shall consider them all. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On Tue, 8 May 2012, Hugh Morgan wrote: Perhaps I have confused the issue. When I initially said data points I meant one stand alone analysis, not one piece of data. Each analysis point takes 1.5 seconds. I have not implemented running this over the whole dataset yet, but I would expect it to take about 5 to 10 hours. This is just about acceptable, but it would be better if this was quicker. As I say, the exact analysis method has not yet been determined, and if that was significantly more computationally intensive then that could be an issue. If I had to do what you write above, I would separate the data into chunks; one for each core/CPU in my system. Then I would invoke R to run on each core/CPU and have that instance process one data set. With sufficient memory for each core/CPU the processing will occur in parallel and cut the overall time by the number of instances running. You might want to turn up the air conditioning around the system 'cause that CPU is going to be working hard. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is the most cost effective hardware for R?
On 05/08/2012 06:02 PM, Rich Shepard wrote: On Tue, 8 May 2012, Hugh Morgan wrote: Perhaps I have confused the issue. When I initially said data points I meant one stand alone analysis, not one piece of data. Each analysis point takes 1.5 seconds. I have not implemented running this over the whole dataset yet, but I would expect it to take about 5 to 10 hours. This is just about acceptable, but it would be better if this was quicker. As I say, the exact analysis method has not yet been determined, and if that was significantly more computationally intensive then that could be an issue. If I had to do what you write above, I would separate the data into chunks; one for each core/CPU in my system. Then I would invoke R to run on each core/CPU and have that instance process one data set. With sufficient memory for each core/CPU the processing will occur in parallel and cut the overall time by the number of instances running. You might want to turn up the air conditioning around the system 'cause that CPU is going to be working hard. That is roughly how I am working on getting it running currently, and the 5 hour estimate assumes that is perfectly parallelisable. We have a server room with a reasonable air con. I have only just thought about adding the extra cooling to the total cost, but I suspect that that will come from a different budget so may not matter so much. I shall include it in the quote until told to do otherwise. Rich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. This email may have a PROTECTIVE MARKING, for an explanation please see: http://www.mrc.ac.uk/About/Informationandstandards/Documentmarking/index.htm __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.