You don't have to time to make it compile on windows but you have time to write it from scratch? I'm sure they'd appreciate a patch for windows-compatibility.
On 08/13/2015 01:30 PM, Dale Smith wrote: > sklearn-compiledtrees is not usable on Windows without some work. I didn't > have time to get it to work. > > > Dale Smith, Ph.D. > Data Scientist > > > > d. 404.495.7220 x 4008 f. 404.795.7221 > Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA > 30305 > > > > -----Original Message----- > From: Andreas Mueller [mailto:t3k...@gmail.com] > Sent: Thursday, August 13, 2015 1:28 PM > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees > > For C you should definitely check out this: > https://github.com/ajtulloch/sklearn-compiledtrees/ > > It's linked here btw ;) > http://scikit-learn.org/dev/related_projects.html > > On 08/13/2015 01:04 PM, Simon Burton wrote: >> Surprisingly, I am working on a similar code generation project, with >> the target language being C. One of the reasons I chose to use >> decision trees (& ensembles there-of) was that it should be easy to >> code gen these things & deploy. >> >> >> >> On Wed, 12 Aug 2015 11:46:18 +0000 >> Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote: >> >>> Hum, I see. So, those values aren't available from the >>> DecisionTreeClassifier class, is that right? >>> >>> Let me make more clear what I'm trying to do, maybe you guys have had >>> this problem in the past and can devise better solutions. I need to >>> embed a classifier in an external code, which is a proof-of-concept. >>> There's a few constraints on how much of that code I have freedom to >>> change, so what seems to be the more productive approach is to do the >>> following: >>> >>> 1) Train/optimize hyperparameters/cross-validate the model with >>> scikit-learn until I have a decent initial model. >>> 2) Implement at the target (probably Java, but could be python) only >>> the part of the code that does the prediction with hard-coded >>> parameters copied from the scikit-learn model. >>> >>> So, for instance, I can train a RandomForestClassifier in >>> scikit-learn and then just implement a simple decision function in >>> the Java code, with all the trees hard-coded (basically just a list >>> of thresholds, features, left and right children and the final class >>> decision for each leaf node, and a method to run the decisions and >>> report the same result that predict_proba would). >>> >>> I can already retrieve most of the needed parameters from the >>> DecisionTreeClassifier (namely: thresholds, left and right children, >>> and the feature index for each node). Is the example count for each >>> class for each node doesn't seem to be externally available? If not I >>> can just do a "manual" count, but it would help. >>> >>> The main problem is: I can't just serialize the final trained model >>> and load it every time. It would involve more change in the final >>> code than I'm allowed to do (reading the serialized model every time >>> would be a huge overhead and to avoid it I'd have to change code that >>> is beyond the scope of what we're willing to change in the short >>> term). Another problem is that the platform is running in a JVM >>> language, so probably I'll implement that hard-coded predictor in >>> that language. I could get away with python if the dev team decide to >>> use apache thrift for communication but that is currently not a 100% sure >>> thing. >>> >>> If you guys had this kind of problem in the past and found better >>> solutions, I'd be thankful to hear about it. >>> >>> Thanks. >>> >>> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber >>> <jmschreibe...@gmail.com> >>> escreveu: >>> >>>> Hi Rafael >>>> >>>> When the tree needs to make a prediction, it usually goes through >>>> the predict method, then the apply method, then the _apply_dense >>>> method (this helps partition between dense and sparse data). >>>> >>>> Take a look at lines 3463 to 3503, the _apply_dense method. This >>>> ends up returning an array of offsets to the predict method, where >>>> each offset is the leaf node a point falls under. The predict method >>>> then indexes the value array (where node prediction values) are >>>> stored by this offset array, assigning a prediction value to each point. >>>> >>>> A small source of confusion is that for regression trees, the value >>>> array is one value per output per node, which makes sense. However, >>>> for classification trees, the value array stores the number of >>>> training points for each class for each output for each node. For >>>> example, a regression tree may have 2.5 as the prediction value in a >>>> leaf, but a classification tree may have [3, 40, 5] as the value in a leaf >>>> if there are three classes. >>>> The final prediction uses argmax to select class 1. >>>> >>>> Let me know if you have any other questions! >>>> >>>> Jacob >>>> >>>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini < >>>> rafael.calsaver...@gmail.com> wrote: >>>> >>>>> Hi there all, >>>>> >>>>> I'm taking a look on the code for decision trees and trying to >>>>> understand how it actually decides the class and I'm having some >>>>> trouble with the final step. >>>>> >>>>> The heart of the algorithm seem to be on lines 3249 to 3260 of the >>>>> sklearn/tree/_tree.pyx file. >>>>> >>>>> Lines 3249 to 3258 are fine, they are just the standard walking >>>>> through the branchs on the decision trees. What I failed to >>>>> understand is how the tree actually decides which class to assign >>>>> to the sample being classified after it reaches a leaf node. >>>>> Aren't the final classes assigned to each final branch stored anywhere? >>>>> >>>>> Thanks, >>>>> Rafael Calsaverini >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------- >>>>> ----------- >>>>> >>>>> _______________________________________________ >>>>> Scikit-learn-general mailing list >>>>> Scikit-learn-general@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>>> >>>>> >>>> -------------------------------------------------------------------- >>>> ---------- _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >> ---------------------------------------------------------------------- >> -------- _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general