For C you should definitely check out this: https://github.com/ajtulloch/sklearn-compiledtrees/
It's linked here btw ;) http://scikit-learn.org/dev/related_projects.html On 08/13/2015 01:04 PM, Simon Burton wrote: > Surprisingly, I am working on a similar code generation project, > with the target language being C. One of the reasons I chose to > use decision trees (& ensembles there-of) was that it should be > easy to code gen these things & deploy. > > > > On Wed, 12 Aug 2015 11:46:18 +0000 > Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote: > >> Hum, I see. So, those values aren't available from the >> DecisionTreeClassifier class, is that right? >> >> Let me make more clear what I'm trying to do, maybe you guys have had this >> problem in the past and can devise better solutions. I need to embed a >> classifier in an external code, which is a proof-of-concept. There's a few >> constraints on how much of that code I have freedom to change, so what >> seems to be the more productive approach is to do the following: >> >> 1) Train/optimize hyperparameters/cross-validate the model with >> scikit-learn until I have a decent initial model. >> 2) Implement at the target (probably Java, but could be python) only the >> part of the code that does the prediction with hard-coded parameters copied >> from the scikit-learn model. >> >> So, for instance, I can train a RandomForestClassifier in scikit-learn and >> then just implement a simple decision function in the Java code, with all >> the trees hard-coded (basically just a list of thresholds, features, left >> and right children and the final class decision for each leaf node, and a >> method to run the decisions and report the same result that predict_proba >> would). >> >> I can already retrieve most of the needed parameters from the >> DecisionTreeClassifier (namely: thresholds, left and right children, and >> the feature index for each node). Is the example count for each class for >> each node doesn't seem to be externally available? If not I can just do a >> "manual" count, but it would help. >> >> The main problem is: I can't just serialize the final trained model and >> load it every time. It would involve more change in the final code than I'm >> allowed to do (reading the serialized model every time would be a huge >> overhead and to avoid it I'd have to change code that is beyond the scope >> of what we're willing to change in the short term). Another problem is that >> the platform is running in a JVM language, so probably I'll implement that >> hard-coded predictor in that language. I could get away with python if the >> dev team decide to use apache thrift for communication but that is >> currently not a 100% sure thing. >> >> If you guys had this kind of problem in the past and found better >> solutions, I'd be thankful to hear about it. >> >> Thanks. >> >> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber <jmschreibe...@gmail.com> >> escreveu: >> >>> Hi Rafael >>> >>> When the tree needs to make a prediction, it usually goes through the >>> predict method, then the apply method, then the _apply_dense method (this >>> helps partition between dense and sparse data). >>> >>> Take a look at lines 3463 to 3503, the _apply_dense method. This ends up >>> returning an array of offsets to the predict method, where each offset is >>> the leaf node a point falls under. The predict method then indexes the >>> value array (where node prediction values) are stored by this offset array, >>> assigning a prediction value to each point. >>> >>> A small source of confusion is that for regression trees, the value array >>> is one value per output per node, which makes sense. However, for >>> classification trees, the value array stores the number of training points >>> for each class for each output for each node. For example, a regression >>> tree may have 2.5 as the prediction value in a leaf, but a classification >>> tree may have [3, 40, 5] as the value in a leaf if there are three classes. >>> The final prediction uses argmax to select class 1. >>> >>> Let me know if you have any other questions! >>> >>> Jacob >>> >>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini < >>> rafael.calsaver...@gmail.com> wrote: >>> >>>> Hi there all, >>>> >>>> I'm taking a look on the code for decision trees and trying to understand >>>> how it actually decides the class and I'm having some trouble with the >>>> final step. >>>> >>>> The heart of the algorithm seem to be on lines 3249 to 3260 of >>>> the sklearn/tree/_tree.pyx file. >>>> >>>> Lines 3249 to 3258 are fine, they are just the standard walking through >>>> the branchs on the decision trees. What I failed to understand is how the >>>> tree actually decides which class to assign to the sample being classified >>>> after it reaches a leaf node. Aren't the final classes assigned to each >>>> final branch stored anywhere? >>>> >>>> Thanks, >>>> Rafael Calsaverini >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general