Hi,
I have a precomputed kernel of size NxN. I am using GridSearchCV to tune C
parameter of SVM with kernel='precomputed' as follows:
C_range =10.**np.arange(-2,9)param_grid =dict(C=C_range)grid
=GridSearchCV(SVC(kernel='precomputed'),param_grid=param_grid,cv=StratifiedKFold(y=data_label,n_folds=10))grid.fit(kernel,data_label)printgrid.best_score_
This works pretty fine, however since I use the full data for prediction (with
grid.predict(kernel)), it overfits (I get precision/recall = 1.0 most of the
times).
So I would like to first split my data to 10 chunks (9 for training, 1 for
testing) with cross-validation, and in each fold, I want to run GridSearch to
tune the C value on the training set, and test on the testing set.
In order to do this, I sliced the kernel matrix into 100x100 and 50x50
submatrices where I run grid.fit() on one of them and grid.predict() on the
other.
But I get the following error:
ValueError:X.shape[1]=50should be equal to 100,the number of features at
training time
I guess training kernel should have the same dimension as testing kernel, but I
don't understand why, because I simply compute np.dot(X, X.T) for 100x100, and
for 50x50, hence the final kernel have different dimensions..
Thanks,
Ev
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general