Dear John,
Hello,
I am having difficulty with a cross validation problem, and any help
would be much appreciated.
I have a large number of research subjects from 15 different data
collection sites. I want to assess whether "site" has any influence on
the data.
The simplest way to do this is to start with classical statistics: why
not simply testing the impact of the site on the data using classical
analysis of variance ?
It occurred to me that one way to do this would be to perform a
cross-validation, via stratified k folds (stratified, because some
sites have a larger number of subjects than others). Unless I am
mistaken, the results of this analysis should reveal whether "site"
has an influence on the data. However, I am running into a problem
because my training set is a different shape than the test data, which
causes the analysis to fail.
My data structure is pretty simple.
X is a 3 by 1000 matrix of datapoints (that is, 3 datapoints per subject)
y is a 1 by 1000 matrix indicating the site (expressed as an integer
ranging between 1 and 15).
Here is the code that I use, and below it is the error that is produced.
from sklearn import cross_validation
skf = cross_validation.StratifiedKFold(y, 15)
for train_index, test_index in skf:
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = svm.SVC(kernel='rbf', C=1.0)
clf.fit(X_train, X_test)
clf.fit should take (X_train, y_train) which means that you are learning
a model to predict y form X. is this really what you want ?
then clf.score(X_test, y_test) would quantify the performance of the
learned model on test data.
HTH,
Bertrand
Traceback (most recent call last):
File "cross_val.py", line 132, in <module>
clf.fit(X_train, X_test)
File
"/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/scikit_learn-0.13.1-py2.7-macosx-10.5-x86_64.egg/sklearn/svm/base.py",
line 166, in fit
(X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 966 samples, but y has 210.
Thanks for any help you can offer!
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general