Actually, I wonder if there is a difference between our implementation and Matlab's behavior. We seem to reset the seed to a hard-coded value when calling predict and predict_proba:
In predict() and predict_proba() in here, we call set_predict_params(): https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L315 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L381 However, set_predict_params() appears to reset the RNG to a hard-coded value of -1: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/libsvm.pyx#L261 Because you are requesting probability estimates, the state of the RNG will affect the resulting scores. If Matlab doesn't similarly reset the RNG prior to each predict call, then a difference would manifest here. I think if the underlying support vectors match but our predictions do not, this might explain it. Thanks, Michael J. Bommarito II, CEO Bommarito Consulting, LLC *Web:* http://www.bommaritollc.com *Mobile:* +1 (646) 450-3387 On Wed, Jun 22, 2016 at 3:07 PM, Michael Bommarito <mich...@bommaritollc.com > wrote: > Have you tried comparing the fit support vectors prior to comparing > predicted values? You might need to set SaveSupportVectors in Matlab first. > > Thanks, > Michael J. Bommarito II, CEO > Bommarito Consulting, LLC > *Web:* http://www.bommaritollc.com > *Mobile:* +1 (646) 450-3387 > > On Wed, Jun 22, 2016 at 2:50 PM, Taylor, Johnmark < > johnmarktay...@g.harvard.edu> wrote: > >> Many thanks for the responses thus far! >> >> >> *Did you fix the random seeds across implementations as well? >> Differencesin seeds or generators might explain this.* >> >> The implementation of libsvm used by Matlab always has a seed of 1. I >> tried setting the seed for SKL SVM to 1 (and 0, 2, 3, and 4) as well, and >> the results were still different. >> >> >> >> >> *Did you try using the Python API to libsvm directly instead of through >> SKL?I'm guessing you have it on your computer since you have the Matlab >> API.That would at least let you test whether it's the fake data or whether >> it'sSKL.* >> >> I'll give that a shot next, thanks! >> >> >> >> >> >> >> *Also are you loading the fake data from a .mat file into Python (e.g. >> withthe SciPy 'loadmat' function) or are you generating it from a script? >> Maybesome weird floating point error between Python and Matlab is giving >> you thedifferent results? This could happen if you generate the data with a >> scriptwritten in both Python and Matlab, for example... along the same >> lines asthe random seed generator giving different results* >> >> I'm generating the fake data with a Python script and saving it to a .txt >> file, which is then loaded in by Python and Matlab in their respective >> scripts. To make sure there's no truncation error going on when they load >> in this .txt file to get the fake data, I applied the floor function to >> both sets of vectors (to make them ints) in both the Python and Matlab >> scripts, and they still give different results. So I don't think it's a >> data issue. >> >> _______________________________________________ >> scikit-learn mailing list >> scikit-learn@python.org >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn