Re: [scikit-learn] purpose of test: check_classifiers_train

Andreas Mueller Fri, 13 Oct 2017 00:12:30 -0700

Sorry for the misinformation.

Yes, actually I'd argue you should raise an error on data that's notnon-negative, if that's not valid input.Right now there is no way to specify to the testing suite that yourmodel requires positive data, that's what the PR is about

(among other things) that I referenced earlier.


On 10/12/2017 10:10 PM, Michael Capizzi wrote:

So it appears that the test |check_classifiers_train()|(https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1079)does /not/ use the |iris| dataset after all:

|X_m, y_m = make_blobs(n_samples=300, random_state=0) X_m, y_m =shuffle(X_m, y_m, random_state=7) X_m =StandardScaler().fit_transform(X_m) |

But, this also explains why my classifier only gets accuracy of only|31%|. My classifier that I’m trying to build to contribute to|scikit-learn-contrib| is designed to be used on NLP data where thefeatures are /non-negative/ counts:https://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf

Interestingly enough, this classifier reports 100% accuracy on the|iris| dataset (when last 10% is used for testing). But again, themain purpose of this classifier is in NLP cases.

So @andreas mentioned that this can be relaxed “if there’s a goodreason.” Does the above situation qualify?

-M

On Thu, Oct 12, 2017 at 11:27 AM, Michael Capizzi<mcapi...@email.arizona.edu <mailto:mcapi...@email.arizona.edu>> wrote:


    Thanks @andreas, for your comments, especially the info that it's
    the `iris` dataset.  I have to dig a bit deeper to see what's
    going on with the performance there.  But now that I know it's
    `iris`, I can try to recreate.

    -M

    On Thu, Oct 12, 2017 at 12:01 AM, Andreas Mueller
    <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

        Yes, it's pretty empirical, and with the estimator tags PR
        (https://github.com/scikit-learn/scikit-learn/pull/8022
        <https://github.com/scikit-learn/scikit-learn/pull/8022>) we
        will be able to relax it if there's a good reason you're not
        passing.
        But the dataset is pretty trivial (iris), and you're getting
        chance performance (it's a balanced three class problem). So
        that is not a great sign for your estimator.


        On 10/11/2017 07:09 PM, Guillaume Lemaître wrote:

        Not sure 100% but this is an integration/sanity check since
        all classifiers are supposed to predict quite well and data
        used to train.
        This is true that 83% is empirical but it allows to spot any
        changes done in the algorithms even if the unit tests are
        passing for some reason.

        On 11 October 2017 at 18:52, Michael Capizzi
        <mcapi...@email.arizona.edu
        <mailto:mcapi...@email.arizona.edu>> wrote:

            I’m wondering if anyone can identify the purpose of this
            test: |check_classifiers_train()|, specifically this
            line:
            
https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1106
            
<https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/utils/estimator_checks.py#L1106>

            My custom classifier (which I’m hoping to submit to
            |scikit-learn-contrib|) is failing this test:

            |File
            
"/Users/mcapizzi/miniconda3/envs/nb_plus_svm/lib/python3.6/site-packages/sklearn/utils/estimator_checks.py",
            line 1106, in check_classifiers_train
            assert_greater(accuracy_score(y, y_pred), 0.83)
            AssertionError: 0.31333333333333335 not greater than 0.83 |

            And while it’s disturbing that my classifier is getting
            31% |accuracy| when, clearly, the test writer expects it
            to be in the upper-80s, I’m not sure I understand why
            that would be a test condition.

            Thanks for any insight.

            

            _______________________________________________
            scikit-learn mailing list
            scikit-learn@python.org <mailto:scikit-learn@python.org>
            https://mail.python.org/mailman/listinfo/scikit-learn
            <https://mail.python.org/mailman/listinfo/scikit-learn>

--Guillaume Lemaitre

        INRIA Saclay - Parietal team
        Center for Data Science Paris-Saclay
        https://glemaitre.github.io/


        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>



        _______________________________________________
        scikit-learn mailing list
        scikit-learn@python.org <mailto:scikit-learn@python.org>
        https://mail.python.org/mailman/listinfo/scikit-learn
        <https://mail.python.org/mailman/listinfo/scikit-learn>





_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] purpose of test: check_classifiers_train

Reply via email to