Hi all,
TL;DR version:
I'm looking for a classifier that will get the *exact same model* for shuffled
versions of the training data. I thought GaussianNB would do the trick but
either I don't understand it, or some kind of numerical instability prevents it
from achieving the same model on subsequent shuffling of the data — I get about
1e-18 absolute tolerance on theta_ but only 1e-5 on sigma_. Thoughts?
Longer version with cute lesson learned:
I hit another snag with testing for the Py2-3 transition on my
sklearn-dependent library. This was a fun one to debug. Essentially, I was
getting some training data, learning a random forest, and then checking the
predict_proba() outcome on a test set. This was failing, so I assumed that
somehow the seeding wasn't giving the same outcome in Py2 and 3. I checked up
and down and sure enough, random seeding was working fine.
The random change that *did* happen was because I was learning edges from a
networkx graph. Fun fact: networkx.Graph.edges() is actually an iterator over
dictionary keys, whose ordering is thus not guaranteed, though it is perfectly
reproducible across most implementations of Py2.7. So, although my tests had
been happily chugging along for a long time, this ordering changed in Py3.4,
thus changing the order of the training data and the outcome of
RandomForestClassifier().fit().
I tried using GaussianNB() as the classifier but that still doesn't have
reproducible behaviour between Python versions. Any other suggestions?
Thanks!
Juan.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general