I am trying to run a monte carlo process using snow with a MPI cluster.  I 
have ~thirty processors to run the algorithm on and I want to run it 5000 
times and take the average of the output.  A very simple way to do this is 
to divide 5000 by the number of processors to get a number n and tell each 
processor to run the algorithm n times.  I realize there are more efficient 
ways to manage the parallelization.   To implement this I used the 
clusterCall command with the replicate function along the lines of
clusterCall(cl, replicate, n, function(args)).  Because my function is a 
monte carlo process it relies on drawing from random distributions to 
generate output.  When I do this, all of my processors generate the same 
random numbers.  I copied the following from the command space for a simple 
example:
cl<-makeCluster(cl, replicate,1,runif(2))
 clusterCall(cl, replicate, 2, runif(2))
[[1]]
0.6533959    0.6533959
0.1071051    0.1071051
[[2]]
0.6533959    0.6533959
0.1071051    0.1071051

This is not alleviated by using clusterApply to set a random seed for each 
processor and seems to be related to the use of the replicate function 
within clusterCall.  I have rearranged the function so that replicate is 
used to call the clusterCall function (ie. replicate(2, clusterCall(cl, 
runif,2),simplify=F) ) and resolved the random number issue.  However, this 
also involves much more communication between master and slaves and results 
in slower computation time.   Will rsprng fix this problem?  Is there a 
better way to do this without using replicate?
I hope this is somewhat clear.

Thanks,
Mike

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to