Sorry for the misinformation.
Yes, actually I'd argue you should raise an error on data that's not
non-negative, if that's not valid input.
Right now there is no way to specify to the testing suite that your
model requires positive data, that's what the PR is about
(among other things) that I referenced earlier.
On 10/12/2017 10:10 PM, Michael Capizzi wrote:
So it appears that the test |check_classifiers_train()|
(https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1079)
does /not/ use the |iris| dataset after all:
|X_m, y_m = make_blobs(n_samples=300, random_state=0) X_m, y_m =
shuffle(X_m, y_m, random_state=7) X_m =
StandardScaler().fit_transform(X_m) |
But, this also explains why my classifier only gets accuracy of only
|31%|. My classifier that I’m trying to build to contribute to
|scikit-learn-contrib| is designed to be used on NLP data where the
features are /non-negative/ counts:
https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
Interestingly enough, this classifier reports 100% accuracy on the
|iris| dataset (when last 10% is used for testing). But again, the
main purpose of this classifier is in NLP cases.
So @andreas mentioned that this can be relaxed “if there’s a good
reason.” Does the above situation qualify?
-M
On Thu, Oct 12, 2017 at 11:27 AM, Michael Capizzi
<mcapi...@email.arizona.edu <mailto:mcapi...@email.arizona.edu>> wrote:
Thanks @andreas, for your comments, especially the info that it's
the `iris` dataset. I have to dig a bit deeper to see what's
going on with the performance there. But now that I know it's
`iris`, I can try to recreate.
-M
On Thu, Oct 12, 2017 at 12:01 AM, Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
Yes, it's pretty empirical, and with the estimator tags PR
(https://github.com/scikit-learn/scikit-learn/pull/8022
<https://github.com/scikit-learn/scikit-learn/pull/8022>) we
will be able to relax it if there's a good reason you're not
passing.
But the dataset is pretty trivial (iris), and you're getting
chance performance (it's a balanced three class problem). So
that is not a great sign for your estimator.
On 10/11/2017 07:09 PM, Guillaume Lemaître wrote:
Not sure 100% but this is an integration/sanity check since
all classifiers are supposed to predict quite well and data
used to train.
This is true that 83% is empirical but it allows to spot any
changes done in the algorithms even if the unit tests are
passing for some reason.
On 11 October 2017 at 18:52, Michael Capizzi
<mcapi...@email.arizona.edu
<mailto:mcapi...@email.arizona.edu>> wrote:
I’m wondering if anyone can identify the purpose of this
test: |check_classifiers_train()|, specifically this
line:
https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1106
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1106>
My custom classifier (which I’m hoping to submit to
|scikit-learn-contrib|) is failing this test:
|File
"/Users/mcapizzi/miniconda3/envs/nb_plus_svm/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py",
line 1106, in check_classifiers_train
assert_greater(accuracy_score(y, y_pred), 0.83)
AssertionError: 0.31333333333333335 not greater than 0.83 |
And while it’s disturbing that my classifier is getting
31% |accuracy| when, clearly, the test writer expects it
to be in the upper-80s, I’m not sure I understand why
that would be a test condition.
Thanks for any insight.
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn