On Apr 15, 2011, at 5:51 PM, Gregory Magoon wrote: > For what it's worth, I value reproducibility for conformer generation, > and I use the same random seed every time I run it. Out of curiosity, > what exactly is the issue with minstd_rand?
First, it's a linear congruential algorithm. See http://en.wikipedia.org/wiki/Linear_congruential_generator LCGs should not be used for applications where high-quality randomness is critical. For example, it is not suitable for a Monte Carlo simulation because of the serial correlation (among other things). if an LCG is used to choose points in an n-dimensional space, the points will lie on, at most, m**(1/n) hyperplanes. In other words, it's unlikely to be a good source of randomness for conformation generation. Second, it has a cycle length of 2**31-2. This means that after about 2 billion RNGs it will return to the same sequence. Using a Mersenne Twister (a better but more computationally expensive algorithm) on my laptop takes 0.0371 usec per call, so can deplete 2**31 values in a bit over 1 minute. You almost certainly don't generate 2 billion conformations of the same structure, so you probably think that isn't a problem. However, that's going to depend on how the random numbers are used. For example, are 3 RNGs used for each atom? If so, then there's only a million conformations before you get repeats. If you are unlucky then you might even loop through the RNG cycle at some multiple of the number of times you use the RNG, so that successive calculations give no new information. Third, it takes a seed of size only 2**31. If you use random seeds then you would expect that by the time you have done about 55K generations then you'll have a 50% chance of having an exact duplicate conformation. This is the so-called birthday "paradox" or birthday problem. http://en.wikipedia.org/wiki/Birthday_problem It's almost certain that you don't expect or want a duplication, and your analysis statistics don't factor that into account. It's therefore better to use an RNG which can (and by default does) work from a larger initialization vector. Andrew da...@dalkescientific.com ------------------------------------------------------------------------------ Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev _______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss