Yes, that's totally fine. The error is unrelated and just means you need to call ``check_is_fitted`` in your predict method
to give a nicer error message.

On 08/04/2017 06:29 AM, Sam Barnett wrote:
Hi Andy,
I have since been able to resolve the pickling issue, though I am now getting an error message saying that an error message does not include the expected string 'fit'. In general, I am trying to use the fit() method of my classifier to instantiate a separate SVC() classifier with a custom kernel, fit THAT to the data, then return this instance as the fitted version of the new classifier. Is this possible in theory? If so, what is the best way to implement it?

As before, the requisite code and a .ipynb file is attached.

Best,
Sam

On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

    Hi Sam.
    You need to put these into a reachable namespace (possibly as
    private functions) so that they can be pickled.
    Please stay on the sklearn mailing list, I might not have time to
    reply.

    Andy


    On 08/03/2017 01:24 PM, Sam Barnett wrote:
    Hi Andy,

    I've since tried a different solution: instead of a pipeline,
    I've simply created a classifier that is for the most part like
    svm.SVC, though it takes a few extra inputs for the
    sequentialisation step. I've used a Python function that can
    compute the Gram matrix between two datasets of any shape to pass
    into SVC(), though I'm now having trouble with pickling on the
    check_estimator test. It appears that SeqSVC.fit() doesn't like
    to have methods defined within it. Can you see how to pass this
    test? (the .ipynb file shows the error).

    Best,
    Sam

    On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett
    <sambarnet...@gmail.com <mailto:sambarnet...@gmail.com>> wrote:

        You're right: it does fail without GridSearchCV when I change
        the size of seq_test. I will look at the transform tomorrow
        to see if I can work this out. Thank you for your help so far!

        On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller
        <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

            Change the size of seq_test in your notebook and you'll
            see the failure without GridSearchCV.
            I haven't looked at your code in detail, but transform is
            supposed to work on arbitrary new data with the same
            number of features.
            Your code requires the test data to have the same shape
            as the training data.
            Cross-validation will lead to training data and test data
            having different sizes. But I feel like something is
            already wrong if your
            test data size depends on your training data size.



            On 08/02/2017 03:08 PM, Sam Barnett wrote:
            Hi Andy,

            The purpose of the transformer is to take an ordinary
            kernel (in this case I have taken 'rbf' as a default)
            and return a 'sequentialised' kernel using a few extra
            parameters. Hence, the transformer takes an ordinary
            data-target pair X, y as its input, and the
            fit_transform(X, y) method will output the Gram matrix
            for X that is associated with this sequentialised
            kernel. In the pipeline, this Gram matrix is passed into
            an SVC classifier with the kernel parameter set to
            'precomputed'.

            Therefore, I do not think your hacky solution would be
            possible. However, I am still unsure how to implement
            your first solution: won't the Gram matrix from the
            transformer contain all the necessary kernel values?
            Could you elaborate further?


            Best,
            Sam

            On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller
            <t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:

                Hi Sam.
                GridSearchCV will do cross-validation, which
                requires to "transform" the test data.
                The shape of the test-data will be different from
                the shape of the training data.
                You need to have the ability to compute the kernel
                between the training data and new test data.

                A more hacky solution would be to compute the full
                kernel matrix in advance and pass that to GridSearchCV.

                You probably don't need it here, but you should also
                checkout what the _pairwise attribute does in
                cross-validation,
                because that it likely to come up when playing with
                kernels.

                Hth,
                Andy


                On 08/02/2017 08:38 AM, Sam Barnett wrote:
                Dear all,

                I have created a 2-step pipeline with a custom
                transformer followed by a simple SVC classifier,
                and I wish to run a grid-search over it. I am able
                to successfully create the transformer and the
                pipeline, and each of these elements work fine.
                However, when I try to use the fit() method on my
                GridSearchCV object, I get the following error:

                57 # during fit.
                     58 if X.shape != self.input_shape_:
                ---> 59             raise ValueError('Shape of
                input is different from what was seen '
                     60                              'in `fit`')
                     61

                ValueError: Shape of input is different from what
                was seen in `fit`

                For a full breakdown of the problem, I have written
                a Jupyter notebook showing exactly how the error
                occurs (this also contains all .py files necessary
                to run the notebook). Can anybody see how to work
                through this?

                Many thanks,
                Sam Barnett



                _______________________________________________
                scikit-learn mailing list
                scikit-learn@python.org
                <mailto:scikit-learn@python.org>
                https://mail.python.org/mailman/listinfo/scikit-learn
                <https://mail.python.org/mailman/listinfo/scikit-learn>


                _______________________________________________
                scikit-learn mailing list
                scikit-learn@python.org <mailto:scikit-learn@python.org>
                https://mail.python.org/mailman/listinfo/scikit-learn
                <https://mail.python.org/mailman/listinfo/scikit-learn>




            _______________________________________________
            scikit-learn mailing list
            scikit-learn@python.org <mailto:scikit-learn@python.org>
            https://mail.python.org/mailman/listinfo/scikit-learn
            <https://mail.python.org/mailman/listinfo/scikit-learn>






_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to