For my research, I'm working with multi-output decision trees. In the current
sklearn implementation, a tree can predict either several numerical or several
categorical targets simultaneously, but not a mixture of those. However,
predicting various targets jointly is often beneficial both in terms of speed
and accuracy. Because of that, I'm willing to add this functionality.
It seems that the only thing to be done is to implement a new node splitting
criteria that handles a mixture of nominal and numerical attributes, and then
define a new class of models (such as DecisionTreeRegressor or
DecisionTreeClassifier, but for mixed output). However, since I'm not an
experienced sklearn contributor, I am looking for any hints on how to implement
this in effective way, re-using as much functionality already available as
Your advice is very welcome.
scikit-learn mailing list