I am trying to run a monte carlo process using snow with a MPI cluster. I have ~thirty processors to run the algorithm on and I want to run it 5000 times and take the average of the output. A very simple way to do this is to divide 5000 by the number of processors to get a number n and tell each processor to run the algorithm n times. I realize there are more efficient ways to manage the parallelization. To implement this I used the clusterCall command with the replicate function along the lines of clusterCall(cl, replicate, n, function(args)). Because my function is a monte carlo process it relies on drawing from random distributions to generate output. When I do this, all of my processors generate the same random numbers. I copied the following from the command space for a simple example: cl<-makeCluster(cl, replicate,1,runif(2)) clusterCall(cl, replicate, 2, runif(2)) [[1]] 0.6533959 0.6533959 0.1071051 0.1071051 [[2]] 0.6533959 0.6533959 0.1071051 0.1071051
This is not alleviated by using clusterApply to set a random seed for each processor and seems to be related to the use of the replicate function within clusterCall. I have rearranged the function so that replicate is used to call the clusterCall function (ie. replicate(2, clusterCall(cl, runif,2),simplify=F) ) and resolved the random number issue. However, this also involves much more communication between master and slaves and results in slower computation time. Will rsprng fix this problem? Is there a better way to do this without using replicate? I hope this is somewhat clear. Thanks, Mike ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.