On Thu, Apr 24, 2014 at 10:37 AM, Sturla Molden <sturla.mol...@gmail.com> wrote: > Lars Buitinck <larsm...@gmail.com> wrote: > >>> - If you provide each thread with its own PRNG, you must make sure the >>> sequences don't overlap. Just using a different seed for each thread is not >>> safe either. >> >> I'm not sure what you mean by that; > > A PRNG will generate a seqence of pseudo-random numbers. Presumably you > don't want overlapping sequences in your different threads, as it would > constitute pseudo-sampling. > >> my rand(3) manpage says "In order >> to get reproducible behavior in a threaded application, this state >> must be made explicit; this can be done using the reentrant function >> rand_r()." But you're saying that's not enough? > > This manpage is plain wrong! > > The non-deterministic scheduling of threads means that multithreaded use of > rand_r() will never be reproducible.
No, that's not true. If your threads don't interact with each other (which is typical for many applications!), and each thread is using its own rand_r() state (which is the point of using rand_r() as opposed to rand()), then the multithreaded use of rand_r() certainly can be reproducible. Just like using rand_r() in different processes with their own seeded state can be reproducible. If the threads interact non-trivially, then yeah, of course things won't be reproducible because it wouldn't be reproducible even if a PRNG were not involved. But that's not what the man page is saying. > You cannot be sure that the kernel > will make the threads call rand_r in the same pattern twice. In practice it > will never happen. However, rand_r is reentrant, which is something very > different from reproducible. However, in most other cases reentrant and > threadsafe are equivalent, which might be why they think making a PRNG > reentrant also makes it reproducible in a threaded application. It does > not. > > In order for a PRNG to be reproducible in a threaded application, it must > always deliver the same sequence to the n-th thread. That is a very hard > requirement to satisfy. > > The DC Mersenne Twister solves this No, the DC Mersenne Twister solves the *independence* problem by the following scheme, not the reproducibility problem ("this"). > by encoding thread identifiers into > the charcteristic polynomials in such a way that they are "relatively prime > to each other". That means that each thread gets an independent stream of > random numbers. Since there is one Mersenne Twister object per thread the > kernel's thread scheduling is eliminated as an additional source or > randomness. If the thread scheduling is a problem for rand_r(), it will be a problem for the DC Mersenne Twister. -- Robert Kern ------------------------------------------------------------------------------ Start Your Social Network Today - Download eXo Platform Build your Enterprise Intranet with eXo Platform Software Java Based Open Source Intranet - Social, Extensible, Cloud Ready Get Started Now And Turn Your Intranet Into A Collaboration Platform http://p.sf.net/sfu/ExoPlatform _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general