Hi Sam.
You need to put these into a reachable namespace (possibly as private
functions) so that they can be pickled.
Please stay on the sklearn mailing list, I might not have time to reply.
Andy
On 08/03/2017 01:24 PM, Sam Barnett wrote:
Hi Andy,
I've since tried a different solution: instead of a pipeline, I've
simply created a classifier that is for the most part like svm.SVC,
though it takes a few extra inputs for the sequentialisation step.
I've used a Python function that can compute the Gram matrix between
two datasets of any shape to pass into SVC(), though I'm now having
trouble with pickling on the check_estimator test. It appears that
SeqSVC.fit() doesn't like to have methods defined within it. Can you
see how to pass this test? (the .ipynb file shows the error).
Best,
Sam
On Wed, Aug 2, 2017 at 9:44 PM, Sam Barnett <sambarnet...@gmail.com
<mailto:sambarnet...@gmail.com>> wrote:
You're right: it does fail without GridSearchCV when I change the
size of seq_test. I will look at the transform tomorrow to see if
I can work this out. Thank you for your help so far!
On Wed, Aug 2, 2017 at 9:20 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
Change the size of seq_test in your notebook and you'll see
the failure without GridSearchCV.
I haven't looked at your code in detail, but transform is
supposed to work on arbitrary new data with the same number of
features.
Your code requires the test data to have the same shape as the
training data.
Cross-validation will lead to training data and test data
having different sizes. But I feel like something is already
wrong if your
test data size depends on your training data size.
On 08/02/2017 03:08 PM, Sam Barnett wrote:
Hi Andy,
The purpose of the transformer is to take an ordinary kernel
(in this case I have taken 'rbf' as a default) and return a
'sequentialised' kernel using a few extra parameters. Hence,
the transformer takes an ordinary data-target pair X, y as
its input, and the fit_transform(X, y) method will output the
Gram matrix for X that is associated with this sequentialised
kernel. In the pipeline, this Gram matrix is passed into an
SVC classifier with the kernel parameter set to 'precomputed'.
Therefore, I do not think your hacky solution would be
possible. However, I am still unsure how to implement your
first solution: won't the Gram matrix from the transformer
contain all the necessary kernel values? Could you elaborate
further?
Best,
Sam
On Wed, Aug 2, 2017 at 5:05 PM, Andreas Mueller
<t3k...@gmail.com <mailto:t3k...@gmail.com>> wrote:
Hi Sam.
GridSearchCV will do cross-validation, which requires to
"transform" the test data.
The shape of the test-data will be different from the
shape of the training data.
You need to have the ability to compute the kernel
between the training data and new test data.
A more hacky solution would be to compute the full kernel
matrix in advance and pass that to GridSearchCV.
You probably don't need it here, but you should also
checkout what the _pairwise attribute does in
cross-validation,
because that it likely to come up when playing with kernels.
Hth,
Andy
On 08/02/2017 08:38 AM, Sam Barnett wrote:
Dear all,
I have created a 2-step pipeline with a custom
transformer followed by a simple SVC classifier, and I
wish to run a grid-search over it. I am able to
successfully create the transformer and the pipeline,
and each of these elements work fine. However, when I
try to use the fit() method on my GridSearchCV object, I
get the following error:
57 # during fit.
58 if X.shape != self.input_shape_:
---> 59 raise ValueError('Shape of input is
different from what was seen '
60 'in `fit`')
61
ValueError: Shape of input is different from what was
seen in `fit`
For a full breakdown of the problem, I have written a
Jupyter notebook showing exactly how the error occurs
(this also contains all .py files necessary to run the
notebook). Can anybody see how to work through this?
Many thanks,
Sam Barnett
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org <mailto:scikit-learn@python.org>
https://mail.python.org/mailman/listinfo/scikit-learn
<https://mail.python.org/mailman/listinfo/scikit-learn>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn