Hi Paul. Sorry, I don't completely follow the code. So you say dataActs_array has entries zero and one and there are 1505 ones and 774 zeros? An easier way would be to use np.unique(dataActs_array) and np.bincount(dataActs_array). Could it be that the entries of dataActs_array are strings (this is what the keys look like). Can you give us the dtype of y_test and y_predict and their content (np.unique)?
I think there was an issue with the confusion matrix not handling string labels, but I am not sure if it was fixed. Best, Andy Am 09.11.2012 16:48, schrieb [email protected]: > Dear SciKitters, > > given a dataset (2200 sample, 90 features), I want to train a RF but run > into an interesting issue. > > My array containing the labels (dataActs_array) only shows 2 classes: > " > from collections import defaultdict > d = defaultdict(int) > for elt in dataActs_array: > d[elt] += 1 > print d > defaultdict(<type 'int'>, {'1': 1505, '0': 774}) > " > > However, after splitting into test and train: > " > sklearn.cross_validation import train_test_split > X_train,X_test,y_train,y_test = train_test_split > (dataDescrs_array,dataActs_array,test_size=.4) > " > > the confusion matrix outputs 4 classes: > " > from sklearn.ensemble import RandomForestClassifier > from sklearn import metrics > clf_RF = RandomForestClassifier() > clf_RF = clf_RF.fit(X_train,y_train) > y_predict = clf_RF.predict(X_test) > print metrics.confusion_matrix(y_test,y_predict) > [[0 0 0 0] > [0 0 0 0] > [0 0 0 0] > [0 0 0 0]] > " > > Where have all my classes gone? > Why do I end up in a 4*4 array? > > > Cheers & Thanks, > Paul > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.merckgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer. > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_nov > _______________________________________________ > Scikit-learn-general mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
