On 11/01/2012 03:43 PM, paul.czodrow...@merckgroup.com wrote:
Dear RDKitters,
> > However, I found it strange that "X_train.shape" gives (373, 177) -
> > shouldn't be the second bit be the number of classes, i.e. 2?
>
> [snip]
>
> > 177 corresponds, BTW, to the number of features..
>
> And that's exactly what this is supposed to represent. The number of
> classes is len(np.unique(y)).
>
Got it, thanks!
X_train looks like
"[[ 313.371 294.219 0. ..., 0. 0. 0. ]
[ 234.343 212.167 0. ..., 0. 0. 0. ] ...
"
y_train
"
[0 0 ..
"
X_test & y_test look similar and therefore OK in my naive eyes.
However, SVC gives horrible results:
"
from sklearn.svm import SVC
svm = SVC()
svm.fit(X_train,y_train)
y_predict = svm.predict(X_test)
print metrics.confusion_matrix(y_test,y_predict)
"
=>
[[182 0]
[ 67 0]]
1) First, start with a linear classifier.
2) For SVC you have to cross-validate gamma and C.
3) Your classes are unbalanced. Consider reweighting or subsampling
kNN stops with an error message:
"
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(X_train,y_train)
y_predict = knn.predict(X_test)
"
=>
"
NeighborsWarning: kneighbors: neighbor k+1 and neighbor k have the
same distance: results will be dependent on data order.
neigh_dist, neigh_ind = self.kneighbors(X)
"
As you can see from the message, this is a warning, not an error.
It probably means that either your data is discretized or you have
duplicate input points.
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general