2012/9/25 Doug Coleman <[email protected]>: > label. So to merge predictions from the trees, now I have to do > bookkeeping to remember which trees had which labels in them, and it's > a mess.
You did discover the classes_ attribute, did you? That keeps track of the classes found in y by fit and solves part of the bookkeeping problem. > Someone suggested I use sklearn.feature_extraction.DictVectorizer, but > that seems to be to track the X matrix instead of y. What I might end > up doing is unique/sorting the y labels for each tree, calling > predict_proba on each, adding column vectors of zeros to the > predictions, and then merging the results. No, that's not what DictVectorizer is for. I guess it *could* be used for tracking labels and probabilities, if you fit it on the trivial "dataset" [dict((str(label),0) for label in [-2, -1, 0, 1, 2])] but then still, you have to convert from integers to strings all the time. > What I would prefer to do is call fit with a set of possible labels, > like so: clf.fit(X, y, labels=[-2,1,0,1,2]) so scikit could do the > bookkeeping for me. Obviously some of the trees in my ensemble would > be useless at predicting the -2 or 2 labels, but that's expected. That would be nice. I think we actually put that argument on __init__ where appropriate (SGDClassifier) and call is classes, not labels. Would you perhaps be willing to implement this for decision trees and submit a pull request? > Maybe people don't usually use the library in this way so it doesn't come up? It only comes up in advanced use cases such as online learning, so some estimators have this, but others don't. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
