Hum, I see. So, those values aren't available from the DecisionTreeClassifier class, is that right?
Let me make more clear what I'm trying to do, maybe you guys have had this problem in the past and can devise better solutions. I need to embed a classifier in an external code, which is a proof-of-concept. There's a few constraints on how much of that code I have freedom to change, so what seems to be the more productive approach is to do the following: 1) Train/optimize hyperparameters/cross-validate the model with scikit-learn until I have a decent initial model. 2) Implement at the target (probably Java, but could be python) only the part of the code that does the prediction with hard-coded parameters copied from the scikit-learn model. So, for instance, I can train a RandomForestClassifier in scikit-learn and then just implement a simple decision function in the Java code, with all the trees hard-coded (basically just a list of thresholds, features, left and right children and the final class decision for each leaf node, and a method to run the decisions and report the same result that predict_proba would). I can already retrieve most of the needed parameters from the DecisionTreeClassifier (namely: thresholds, left and right children, and the feature index for each node). Is the example count for each class for each node doesn't seem to be externally available? If not I can just do a "manual" count, but it would help. The main problem is: I can't just serialize the final trained model and load it every time. It would involve more change in the final code than I'm allowed to do (reading the serialized model every time would be a huge overhead and to avoid it I'd have to change code that is beyond the scope of what we're willing to change in the short term). Another problem is that the platform is running in a JVM language, so probably I'll implement that hard-coded predictor in that language. I could get away with python if the dev team decide to use apache thrift for communication but that is currently not a 100% sure thing. If you guys had this kind of problem in the past and found better solutions, I'd be thankful to hear about it. Thanks. Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber <jmschreibe...@gmail.com> escreveu: > Hi Rafael > > When the tree needs to make a prediction, it usually goes through the > predict method, then the apply method, then the _apply_dense method (this > helps partition between dense and sparse data). > > Take a look at lines 3463 to 3503, the _apply_dense method. This ends up > returning an array of offsets to the predict method, where each offset is > the leaf node a point falls under. The predict method then indexes the > value array (where node prediction values) are stored by this offset array, > assigning a prediction value to each point. > > A small source of confusion is that for regression trees, the value array > is one value per output per node, which makes sense. However, for > classification trees, the value array stores the number of training points > for each class for each output for each node. For example, a regression > tree may have 2.5 as the prediction value in a leaf, but a classification > tree may have [3, 40, 5] as the value in a leaf if there are three classes. > The final prediction uses argmax to select class 1. > > Let me know if you have any other questions! > > Jacob > > On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini < > rafael.calsaver...@gmail.com> wrote: > >> Hi there all, >> >> I'm taking a look on the code for decision trees and trying to understand >> how it actually decides the class and I'm having some trouble with the >> final step. >> >> The heart of the algorithm seem to be on lines 3249 to 3260 of >> the sklearn/tree/_tree.pyx file. >> >> Lines 3249 to 3258 are fine, they are just the standard walking through >> the branchs on the decision trees. What I failed to understand is how the >> tree actually decides which class to assign to the sample being classified >> after it reaches a leaf node. Aren't the final classes assigned to each >> final branch stored anywhere? >> >> Thanks, >> Rafael Calsaverini >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general