On 06 Jul 2009, Xavier Saint-Mleux wrote:
> Olivier Delalleau wrote:
> > With the current implementation of PRNGs, isn't there a risk to see the
> > behavior of some component change because we added or removed some
> > component somewhere else in the graph?
> > It seems to me this is the case, which is potentially dangerous because
> > of the risk of doing e.g. different splits / weight initializations
> > without being aware of it.
> >
> >
> >   
> This "danger" is inherent with any pseudo-random number generator; the
> PRNG class just makes it easier to avoid it.  As we have already
> discussed, this class was designed to ensure that parallelized code will
> behave in a deterministic way and yield the exact same output as a
> sequential run.  It was not designed to "do what I mean".
> In most cases, the addition or removal of components in the graph of
> components will not modify the shape of the tree of PRNGs (those are two
> different graphs.)  Also, replacing a component with another one will
> have absolutely no effect on other components' PRNGs (would not be true
> with a single global generator.)  And, it allows you to test with
> different split and weight initializations by simply changing the root
> PRNG's seed while avoiding an accidental overlap of generated numbers,
> as would be the case with several unrelated manually-initialized
> generators; overlap is also avoided when building complex models from
> parts of other, already existing models.  In any case, you need to
> understand what is happening within your model to avoid bad surprises.
> If you know of a better way of dealing with potentially concurrent
> deterministic pseudo-random number generators, please tell me.  As I
> said in our previous discussion, I believe that the biggest possible
> improvement to these PRNGs would be to implement Pierre L'Ecuyer's
> jumpahead to calculate a new seed from an existing one instead of using
> a Numpy-supplied hashing algorithm; the "danger" you mention would still
> be present.

I'm not sure it's a better way, but here's one I like better: just let
the user decide on what PRNGs he wants to see vary. That means the
default behavior is to have one single PRNG / object, each initialized
independently (if you want some variance in the default seed you can try
to make it vary depending on the class, and possibly depending on some
internal class parameters that would induce a different results anyway -
e.g. the number of elements to shuffle in a ShuffleDataset).
If you want to be able to change the behavior of multiple objects by
changing just one seed, you can either:
- have a mechanism to share PRNGs if you want to (similar to what you
  did, but in an opt-in way)
- code your script with seed parameters that depend on a global seed
  (where you can add your own fancy randomly chosen numbers to add some
variances, like:
  Learner1(seed=my_seed + 3485485), Learner2(seed=my_seed + 9394394))

In my experience, you rarely want to vary many random initializations in
an experiment, and when you do, you want to know what you're changing.
If I want to run multiple neural networks with different seeds to see
which works best, I usually don't want to change my splits at the same

Also, I realize the hierarchical way PRNGs are currently created
alleviates some of this issue, since two "branches" in the tree remain
independent. However, the risk still exists, and hiding part of it will
not help in making people aware of it.

That being said, I'm fine with leaving it as it is. It's not a big deal
to me because I understand how it works. And I know we tend to have two
different views on coding, I prefer to make my code as safe as possible
to avoid misleading user errors, while you prefer to give more freedom
to the user and count on him to know what he's doing. I consider both
points of view to be valid.

Oh, and I admit I tend not to worry too much about PRNGs overlapping.
It's never seemed to be a major issue for me, but maybe I just didn't
notice it. Do you have any example of typical machine learning settings
where this can be an issue?


Mailing list: https://launchpad.net/~piaget-dev
Post to     : piaget-dev@lists.launchpad.net
Unsubscribe : https://launchpad.net/~piaget-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to