Hi Juan.
For up to floating point precision, that is pretty hard as Gael
mentioned. 1e-5 on sigma seems pretty low, though.
Can you post data to reproduce?
I would expect most classifiers to go to around 1e-8.
Cheers,
Andreas
On 02/02/2015 10:46 AM, Juan Nunez-Iglesias wrote:
Hi all,
*TL;DR version:*
I'm looking for a classifier that will get the *exact same model* for
shuffled versions of the training data. I thought GaussianNB would do
the trick but either I don't understand it, or some kind of numerical
instability prevents it from achieving the same model on subsequent
shuffling of the data — I get about 1e-18 absolute tolerance on theta_
but only 1e-5 on sigma_. Thoughts?
*Longer version with cute lesson learned:*
I hit another snag with testing for the Py2-3 transition on my
sklearn-dependent library. This was a fun one to debug. Essentially, I
was getting some training data, learning a random forest, and then
checking the predict_proba() outcome on a test set. This was failing,
so I assumed that somehow the seeding wasn't giving the same outcome
in Py2 and 3. I checked up and down and sure enough, random seeding
was working fine.
The random change that *did* happen was because I was learning edges
from a networkx graph. Fun fact: networkx.Graph.edges() is actually an
iterator over dictionary keys, whose ordering is thus not guaranteed,
/though it is perfectly reproducible across most implementations of
Py2.7/. So, although my tests had been happily chugging along for a
long time, this ordering changed in Py3.4, thus changing the order of
the training data and the outcome of RandomForestClassifier().fit().
I tried using GaussianNB() as the classifier but that still doesn't
have reproducible behaviour between Python versions. Any other
suggestions?
Thanks!
Juan.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general