sklearn-compiledtrees is not usable on Windows without some work. I didn't have time to get it to work.
Dale Smith, Ph.D. Data Scientist d. 404.495.7220 x 4008 f. 404.795.7221 Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305 -----Original Message----- From: Andreas Mueller [mailto:t3k...@gmail.com] Sent: Thursday, August 13, 2015 1:28 PM To: scikit-learn-general@lists.sourceforge.net Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees For C you should definitely check out this: https://github.com/ajtulloch/sklearn-compiledtrees/ It's linked here btw ;) http://scikit-learn.org/dev/related_projects.html On 08/13/2015 01:04 PM, Simon Burton wrote: > Surprisingly, I am working on a similar code generation project, with > the target language being C. One of the reasons I chose to use > decision trees (& ensembles there-of) was that it should be easy to > code gen these things & deploy. > > > > On Wed, 12 Aug 2015 11:46:18 +0000 > Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote: > >> Hum, I see. So, those values aren't available from the >> DecisionTreeClassifier class, is that right? >> >> Let me make more clear what I'm trying to do, maybe you guys have had >> this problem in the past and can devise better solutions. I need to >> embed a classifier in an external code, which is a proof-of-concept. >> There's a few constraints on how much of that code I have freedom to >> change, so what seems to be the more productive approach is to do the >> following: >> >> 1) Train/optimize hyperparameters/cross-validate the model with >> scikit-learn until I have a decent initial model. >> 2) Implement at the target (probably Java, but could be python) only >> the part of the code that does the prediction with hard-coded >> parameters copied from the scikit-learn model. >> >> So, for instance, I can train a RandomForestClassifier in >> scikit-learn and then just implement a simple decision function in >> the Java code, with all the trees hard-coded (basically just a list >> of thresholds, features, left and right children and the final class >> decision for each leaf node, and a method to run the decisions and >> report the same result that predict_proba would). >> >> I can already retrieve most of the needed parameters from the >> DecisionTreeClassifier (namely: thresholds, left and right children, >> and the feature index for each node). Is the example count for each >> class for each node doesn't seem to be externally available? If not I >> can just do a "manual" count, but it would help. >> >> The main problem is: I can't just serialize the final trained model >> and load it every time. It would involve more change in the final >> code than I'm allowed to do (reading the serialized model every time >> would be a huge overhead and to avoid it I'd have to change code that >> is beyond the scope of what we're willing to change in the short >> term). Another problem is that the platform is running in a JVM >> language, so probably I'll implement that hard-coded predictor in >> that language. I could get away with python if the dev team decide to >> use apache thrift for communication but that is currently not a 100% sure >> thing. >> >> If you guys had this kind of problem in the past and found better >> solutions, I'd be thankful to hear about it. >> >> Thanks. >> >> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber >> <jmschreibe...@gmail.com> >> escreveu: >> >>> Hi Rafael >>> >>> When the tree needs to make a prediction, it usually goes through >>> the predict method, then the apply method, then the _apply_dense >>> method (this helps partition between dense and sparse data). >>> >>> Take a look at lines 3463 to 3503, the _apply_dense method. This >>> ends up returning an array of offsets to the predict method, where >>> each offset is the leaf node a point falls under. The predict method >>> then indexes the value array (where node prediction values) are >>> stored by this offset array, assigning a prediction value to each point. >>> >>> A small source of confusion is that for regression trees, the value >>> array is one value per output per node, which makes sense. However, >>> for classification trees, the value array stores the number of >>> training points for each class for each output for each node. For >>> example, a regression tree may have 2.5 as the prediction value in a >>> leaf, but a classification tree may have [3, 40, 5] as the value in a leaf >>> if there are three classes. >>> The final prediction uses argmax to select class 1. >>> >>> Let me know if you have any other questions! >>> >>> Jacob >>> >>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini < >>> rafael.calsaver...@gmail.com> wrote: >>> >>>> Hi there all, >>>> >>>> I'm taking a look on the code for decision trees and trying to >>>> understand how it actually decides the class and I'm having some >>>> trouble with the final step. >>>> >>>> The heart of the algorithm seem to be on lines 3249 to 3260 of the >>>> sklearn/tree/_tree.pyx file. >>>> >>>> Lines 3249 to 3258 are fine, they are just the standard walking >>>> through the branchs on the decision trees. What I failed to >>>> understand is how the tree actually decides which class to assign >>>> to the sample being classified after it reaches a leaf node. >>>> Aren't the final classes assigned to each final branch stored anywhere? >>>> >>>> Thanks, >>>> Rafael Calsaverini >>>> >>>> >>>> >>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> ----------- >>>> >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>>> >>> -------------------------------------------------------------------- >>> ---------- _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> > ---------------------------------------------------------------------- > -------- _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general