sklearn-compiledtrees is not usable on Windows without some work. I didn't have 
time to get it to work.


Dale Smith, Ph.D.
Data Scientist
​


d. 404.495.7220 x 4008   f. 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
30305

    

-----Original Message-----
From: Andreas Mueller [mailto:t3k...@gmail.com] 
Sent: Thursday, August 13, 2015 1:28 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees

For C you should definitely check out this:
https://github.com/ajtulloch/sklearn-compiledtrees/

It's linked here btw ;)
http://scikit-learn.org/dev/related_projects.html

On 08/13/2015 01:04 PM, Simon Burton wrote:
> Surprisingly, I am working on a similar code generation project, with 
> the target language being C. One of the reasons I chose to use 
> decision trees (& ensembles there-of) was that it should be easy to 
> code gen these things & deploy.
>
>
>
> On Wed, 12 Aug 2015 11:46:18 +0000
> Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote:
>
>> Hum, I see. So, those values aren't available from the 
>> DecisionTreeClassifier class, is that right?
>>
>> Let me make more clear what I'm trying to do, maybe you guys have had 
>> this problem in the past and can devise better solutions. I need to 
>> embed a classifier in an external code, which is a proof-of-concept. 
>> There's a few constraints on how much of that code I have freedom to 
>> change, so what seems to be the more productive approach is to do the 
>> following:
>>
>> 1) Train/optimize hyperparameters/cross-validate the model with 
>> scikit-learn until I have a decent initial model.
>> 2) Implement at the target (probably Java, but could be python) only 
>> the part of the code that does the prediction with hard-coded 
>> parameters copied from the scikit-learn model.
>>
>> So, for instance, I can train a RandomForestClassifier in 
>> scikit-learn and then just implement a simple decision function in 
>> the Java code, with all the trees hard-coded (basically just a list 
>> of thresholds, features, left and right children and the final class 
>> decision for each leaf node, and a method to run the decisions and 
>> report the same result that predict_proba would).
>>
>> I can already retrieve most of the needed parameters from the 
>> DecisionTreeClassifier (namely: thresholds, left and right children, 
>> and the feature index for each node). Is the example count for each 
>> class for each node doesn't seem to be externally available? If not I 
>> can just do a "manual" count, but it would help.
>>
>> The main problem is: I can't just serialize the final trained model 
>> and load it every time. It would involve more change in the final 
>> code than I'm allowed to do (reading the serialized model every time 
>> would be a huge overhead and to avoid it I'd have to change code that 
>> is beyond the scope of what we're willing to change in the short 
>> term). Another problem is that the platform is running in a JVM 
>> language, so probably I'll implement that hard-coded predictor in 
>> that language. I could get away with python if the dev team decide to 
>> use apache thrift for communication but that is currently not a 100% sure 
>> thing.
>>
>> If you guys had this kind of problem in the past and found better 
>> solutions, I'd be thankful to hear about it.
>>
>> Thanks.
>>
>> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber 
>> <jmschreibe...@gmail.com>
>> escreveu:
>>
>>> Hi Rafael
>>>
>>> When the tree needs to make a prediction, it usually goes through 
>>> the predict method, then the apply method, then the _apply_dense 
>>> method (this helps partition between dense and sparse data).
>>>
>>> Take a look at lines 3463 to 3503, the _apply_dense method. This 
>>> ends up returning an array of offsets to the predict method, where 
>>> each offset is the leaf node a point falls under. The predict method 
>>> then indexes the value array (where node prediction values) are 
>>> stored by this offset array, assigning a prediction value to each point.
>>>
>>> A small source of confusion is that for regression trees, the value 
>>> array is one value per output per node, which makes sense. However, 
>>> for classification trees, the value array stores the number of 
>>> training points for each class for each output for each node. For 
>>> example, a regression tree may have 2.5 as the prediction value in a 
>>> leaf, but a classification tree may have [3, 40, 5] as the value in a leaf 
>>> if there are three classes.
>>> The final prediction uses argmax to select class 1.
>>>
>>> Let me know if you have any other questions!
>>>
>>> Jacob
>>>
>>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini < 
>>> rafael.calsaver...@gmail.com> wrote:
>>>
>>>> Hi there all,
>>>>
>>>> I'm taking a look on the code for decision trees and trying to 
>>>> understand how it actually decides the class and I'm having some 
>>>> trouble with the final step.
>>>>
>>>> The heart of the algorithm seem to be on lines 3249 to 3260 of the 
>>>> sklearn/tree/_tree.pyx file.
>>>>
>>>> Lines 3249 to 3258 are fine, they are just the standard walking 
>>>> through the branchs on the decision trees. What I failed to 
>>>> understand is how the tree actually decides which class to assign 
>>>> to the sample being classified after it reaches a leaf node.  
>>>> Aren't the final classes assigned to each final branch stored anywhere?
>>>>
>>>> Thanks,
>>>> Rafael Calsaverini
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -------------------------------------------------------------------
>>>> -----------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>> --------------------------------------------------------------------
>>> ---------- _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
> ----------------------------------------------------------------------
> -------- _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to