2012/9/25 Doug Coleman <[email protected]>:
> label. So to merge predictions from the trees, now I have to do
> bookkeeping to remember which trees had which labels in them, and it's
> a mess.

You did discover the classes_ attribute, did you? That keeps track of
the classes found in y by fit and solves part of the bookkeeping
problem.

> Someone suggested I use sklearn.feature_extraction.DictVectorizer, but
> that seems to be to track the X matrix instead of y. What I might end
> up doing is unique/sorting the y labels for each tree, calling
> predict_proba on each, adding column vectors of zeros to the
> predictions, and then merging the results.

No, that's not what DictVectorizer is for. I guess it *could* be used
for tracking labels and probabilities, if you fit it on the trivial
"dataset"

[dict((str(label),0) for label in [-2, -1, 0, 1, 2])]

but then still, you have to convert from integers to strings all the time.

> What I would prefer to do is call fit with a set of possible labels,
> like so: clf.fit(X, y, labels=[-2,1,0,1,2]) so scikit could do the
> bookkeeping for me. Obviously some of the trees in my ensemble would
> be useless at predicting the -2 or 2 labels, but that's expected.

That would be nice. I think we actually put that argument on __init__
where appropriate (SGDClassifier) and call is classes, not labels.
Would you perhaps be willing to implement this for decision trees and
submit a pull request?

> Maybe people don't usually use the library in this way so it doesn't come up?

It only comes up in advanced use cases such as online learning, so
some estimators have this, but others don't.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to