[R] Amazon AWS, RGenoud, Parallel Computing
Dear R group, since I do only have a moderately fast MacBook and I would like to get my results faster than within 72h per run ( :-((( ), I tried Amazon AWS, which offers pretty fast computers via remote access. I don't want to post any code because its several hundreds lines of code, I am not looking for the optimal answer, but maybe some suggestions from you if faced similar problems. I did install Revolution on Windows on the amazon instance. I usually go for the large one (8 cores, about 20 Ghz, several GB of RAM). - I am running a financial analysis over several periods (months) in which various CVaR calculations are made (with the rGenoud package). The periods do depend on each other, so parallelizing that does not work. I was quite surprised how well written all the libraries seem for R on Mac since they seem to use my dual core on the Macbook for a large portion of the calculations (I guess the matrix multiplications and the like). I was a little bit astonished though, that the performance increase on the Amazon instance (about 5 times faster than my Macbook) was very moderate with only about a 30% decrease in calculation time. The CPUs were about 60% in use (obviously, the code was not written specifically for several cores). (1) I did try to use multiple cores for the rGenoud package (snow package) as mentioned on the excellent website (http://sekhon.berkeley.edu/rgenoud/multiple_cpus.html) but found a rather strange behaviour: The CPU use on the Amazon instances would decrease to about 25% with periodic peaks. At least the first instance/optimization rum took significant longer (several times longer) than without explicitly including multicores in the genoud function. The number of cores I used was usually smaller than the number of cores I had at my service (4 of 8). So it does not seem like I am able to improve my performance here, even though I think it is somewhat strange... (2) I tried to improve the performance by parallelizing the solution quality functions (which are subject to minimization by rGenoud): One was basically a sorting algorithm (CVaR), the other one just a matrix multiplication sort of thing. Parallelizing either the composition of the solution function (which was the sum of the CVaR and matrix multiplication) or parallelizing the sort function (splitting up the dataset and later uniting subsets of the solution again) did not show any improvements: the performance was much worse - even though all 8 CPUs were 100% idle... I do think that it has to do with all the data management between the instances... I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. I tried snowfall (for #2) and the snow package (#1). I also tried the foreach library - but could get it working on windows... Suggestions with respect to operating system, Amazon AWS, or rgenoud are highly appreciated. Thanks a lot! Lui __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Hello Mike, thank you very much for your response! Best to my knowledge the sort algorithm implemented in R is already backed by C++ code and not natively written in R. Writing the code in C++ is not really an option either (i think rGenoud is also written in C++). I am not sure whether there really is a bottleneck with respect to the computer - I/O is pretty low, plenty of RAM left etc. It really seems to me as if parallelizing is not easily possible or only at high costs so that the benefits diminish through all the coordination and handling needed... Did anybody use rGenoud in cluster mode an experience sth similar? Are quicksort packages available using multiple processors efficiently (I didnt find any... :-( ). I am by no means an expert on parallel processing, but is it possible, that benefits from parallelizing a process greatly diminish if a large set of variables/functions need to be made available and the actual function (in this case sorting a few hundred entries) is quite short whereas the number of times the function is called is very high!? It was quite striking that the first run usually took several hours (instead of half an hour) and the subsequent runs were much much faster.. There is so much happening behind the scenes that it is a little hard for me to tell what might help - and what will not... Help appreciated :-) Thank you Lui On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka marchy...@hotmail.com wrote: Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Amazon AWS, RGenoud, Parallel Computing
Date: Sat, 11 Jun 2011 19:57:47 +0200 Subject: Re: [R] Amazon AWS, RGenoud, Parallel Computing From: lui.r.proj...@googlemail.com To: marchy...@hotmail.com CC: r-help@r-project.org Hello Mike, [[elided Hotmail spam]] Best to my knowledge the sort algorithm implemented in R is already backed by C++ code and not natively written in R. Writing the code in C++ is not really an option either (i think rGenoud is also written in C++). I am not sure whether there really is a bottleneck with respect to the computer - I/O is pretty low, plenty of RAM left etc. It really seems to me as if parallelizing is not easily possible or only at high costs so that the benefits diminish through all the coordination and handling needed... Did anybody use rGenoud in cluster mode an experience sth similar? Are quicksort packages available using multiple processors efficiently (I didnt find any... :-( ). I'm no expert but these don't seem to be terribly subtle problems in most cases. Sure, if the task is not suited to parallelism and you force it to be parallel and it spends all its time syncing up, that can be a problem. Just making more tasks to fight over the bottle neck- memory, CPU, locks- can easily make things worse. I think I posted my link earlier on IEEE blurb showing how easy it is for many cores to make things worse on non-contrived benchmarks. I am by no means an expert on parallel processing, but is it possible, that benefits from parallelizing a process greatly diminish if a large set of variables/functions need to be made available and the actual function (in this case sorting a few hundred entries) is quite short whereas the number of times the function is called is very high!? It was quite striking that the first run usually took several hours (instead of half an hour) and the subsequent runs were much much faster.. There is so much happening behind the scenes that it is a little hard for me to tell what might help - and what will not... Help appreciated :-) Thank you Lui On Sat, Jun 11, 2011 at 4:42 PM, Mike Marchywka wrote: Date: Sat, 11 Jun 2011 13:03:10 +0200 From: lui.r.proj...@googlemail.com To: r-help@r-project.org Subject: [R] Amazon AWS, RGenoud, Parallel Computing Dear R group, [...] I am a little bit puzzled now about what I could do... It seems like there are only very limited options for me to increase the performance. Does anybody have experience with parallel computations with rGenoud or parallelized sorting algorithms? I think one major problem is that the sorting happens rather quick (only a few hundred entries to sort), but needs to be done very frequently (population size 2000, iterations 500), so I guess the problem with the housekeeping of the parallel computation deminishes all benefits. Your sort is part of algorithm or you have to sort results after getting then back out of order from async processes? One of my favorite anecdotes is how I used a bash sort on huge data file to make program run faster ( from impractical zero percent CPU to very fast with full CPU usage and you complain about exactly a lack of CPU saturation). I guess a couple of comments. First, if you have specialized apps you need optimized, you may want to write dedicated c++ code. However, this won't help if you don't find the bottleneck. Lack of CPU saturation could easily be due to waiting for stuff like disk IO or VM swap. You really ought to find the bottle neck first, it could be anything ( except the CPU maybe LOL). The sort that I used prevented VM thrashing with no change to the app code- the app got sorted data and so VM paging became infrequent. If you can specify the problem precisely you may be able to find a simple solution. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.