Yes, that's totally fine. The error is unrelated and just means you need
to call ``check_is_fitted`` in your predict method
to give a nicer error message.
On 08/04/2017 06:29 AM, Sam Barnett wrote:
Hi Andy,
I have since been able to resolve the pickling issue, though I am now
getting an error message saying that an error message does not include
the expected string 'fit'. In general, I am trying to use the fit()
method of my classifier to instantiate a separate SVC() classifier
with a custom kernel, fit THAT to the data, then return this instance
as the fitted version of the new classifier. Is this possible in
theory? If so, what is the best way to implement it?
As before, the requisite code and a .ipynb file is attached.
Best,
Sam
On Thu, Aug 3, 2017 at 6:35 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Hi Sam.
You need to put these into a reachable namespace (possibly as
private functions) so that they can be pickled.
Please stay on the sklearn mailing list, I might not have time to
reply.
Andy
On 08/03/2017 01:24 PM, Sam Barnett wrote:
Hi Andy,
I've since tried a different solution: instead of a pipeline,
I've simply created a classifier that is for the most part like
svm.SVC, though it takes a few extra inputs for the
sequentialisation step. I've used a Python function that can
compute the Gram matrix between two datasets of any shape to pass
into SVC(), though I'm now having trouble with pickling on the
check_estimator test. It appears that SeqSVC.fit() doesn't like
to have methods defined within it. Can you see how to pass this
test? (the .ipynb file shows the error).
Best,
Sam
On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett
<sambarnet...@gmail.com <mailto:sambarnet...@gmail.com>> wrote:
You're right: it does fail without GridSearchCV when I change
the size of seq_test. I will look at the transform tomorrow
to see if I can work this out. Thank you for your help so far!
On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
Change the size of seq_test in your notebook and you'll
see the failure without GridSearchCV.
I haven't looked at your code in detail, but transform is
supposed to work on arbitrary new data with the same
number of features.
Your code requires the test data to have the same shape
as the training data.
Cross-validation will lead to training data and test data
having different sizes. But I feel like something is
already wrong if your
test data size depends on your training data size.
On 08/02/2017 03:08 PM, Sam Barnett wrote:
Hi Andy,
The purpose of the transformer is to take an ordinary
kernel (in this case I have taken 'rbf' as a default)
and return a 'sequentialised' kernel using a few extra
parameters. Hence, the transformer takes an ordinary
data-target pair X, y as its input, and the
fit_transform(X, y) method will output the Gram matrix
for X that is associated with this sequentialised
kernel. In the pipeline, this Gram matrix is passed into
an SVC classifier with the kernel parameter set to
'precomputed'.
Therefore, I do not think your hacky solution would be
possible. However, I am still unsure how to implement
your first solution: won't the Gram matrix from the
transformer contain all the necessary kernel values?
Could you elaborate further?
Best,
Sam
On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
Hi Sam.
GridSearchCV will do cross-validation, which
requires to "transform" the test data.
The shape of the test-data will be different from
the shape of the training data.
You need to have the ability to compute the kernel
between the training data and new test data.
A more hacky solution would be to compute the full
kernel matrix in advance and pass that to GridSearchCV.
You probably don't need it here, but you should also
checkout what the _pairwise attribute does in
cross-validation,
because that it likely to come up when playing with
kernels.
Hth,
Andy
On 08/02/2017 08:38 AM, Sam Barnett wrote:
Dear all,
I have created a 2-step pipeline with a custom
transformer followed by a simple SVC classifier,
and I wish to run a grid-search over it. I am able
to successfully create the transformer and the
pipeline, and each of these elements work fine.
However, when I try to use the fit() method on my
GridSearchCV object, I get the following error:
57 # during fit.
58 if X.shape != self.input_shape_:
---> 59 raise ValueError('Shape of
input is different from what was seen '
60 'in `fit`')
61
ValueError: Shape of input is different from what
was seen in `fit`
For a full breakdown of the problem, I have written
a Jupyter notebook showing exactly how the error
occurs (this also contains all .py files necessary
to run the notebook). Can anybody see how to work
through this?
Many thanks,
Sam Barnett
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
<mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn