Andreas,

I tried to compile the package on Windows and didn't succeed. I gave up since I 
could not get the dependencies to compile.


Dale Smith, Ph.D.
Data Scientist
​


d. 404.495.7220 x 4008   f. 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
30305

    


-----Original Message-----
From: Andreas Mueller [mailto:t3k...@gmail.com] 
Sent: Thursday, August 13, 2015 1:35 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees

You don't have to time to make it compile on windows but you have time to write 
it from scratch?
I'm sure they'd appreciate a patch for windows-compatibility.


On 08/13/2015 01:30 PM, Dale Smith wrote:
> sklearn-compiledtrees is not usable on Windows without some work. I didn't 
> have time to get it to work.
>
>
> Dale Smith, Ph.D.
> Data Scientist
> ​
>
>
> d. 404.495.7220 x 4008   f. 404.795.7221
> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
> 30305
>
>      
>
> -----Original Message-----
> From: Andreas Mueller [mailto:t3k...@gmail.com]
> Sent: Thursday, August 13, 2015 1:28 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees
>
> For C you should definitely check out this:
> https://github.com/ajtulloch/sklearn-compiledtrees/
>
> It's linked here btw ;)
> http://scikit-learn.org/dev/related_projects.html
>
> On 08/13/2015 01:04 PM, Simon Burton wrote:
>> Surprisingly, I am working on a similar code generation project, with
>> the target language being C. One of the reasons I chose to use
>> decision trees (& ensembles there-of) was that it should be easy to
>> code gen these things & deploy.
>>
>>
>>
>> On Wed, 12 Aug 2015 11:46:18 +0000
>> Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote:
>>
>>> Hum, I see. So, those values aren't available from the
>>> DecisionTreeClassifier class, is that right?
>>>
>>> Let me make more clear what I'm trying to do, maybe you guys have had
>>> this problem in the past and can devise better solutions. I need to
>>> embed a classifier in an external code, which is a proof-of-concept.
>>> There's a few constraints on how much of that code I have freedom to
>>> change, so what seems to be the more productive approach is to do the 
>>> following:
>>>
>>> 1) Train/optimize hyperparameters/cross-validate the model with
>>> scikit-learn until I have a decent initial model.
>>> 2) Implement at the target (probably Java, but could be python) only
>>> the part of the code that does the prediction with hard-coded
>>> parameters copied from the scikit-learn model.
>>>
>>> So, for instance, I can train a RandomForestClassifier in
>>> scikit-learn and then just implement a simple decision function in
>>> the Java code, with all the trees hard-coded (basically just a list
>>> of thresholds, features, left and right children and the final class
>>> decision for each leaf node, and a method to run the decisions and
>>> report the same result that predict_proba would).
>>>
>>> I can already retrieve most of the needed parameters from the
>>> DecisionTreeClassifier (namely: thresholds, left and right children,
>>> and the feature index for each node). Is the example count for each
>>> class for each node doesn't seem to be externally available? If not I
>>> can just do a "manual" count, but it would help.
>>>
>>> The main problem is: I can't just serialize the final trained model
>>> and load it every time. It would involve more change in the final
>>> code than I'm allowed to do (reading the serialized model every time
>>> would be a huge overhead and to avoid it I'd have to change code that
>>> is beyond the scope of what we're willing to change in the short
>>> term). Another problem is that the platform is running in a JVM
>>> language, so probably I'll implement that hard-coded predictor in
>>> that language. I could get away with python if the dev team decide to
>>> use apache thrift for communication but that is currently not a 100% sure 
>>> thing.
>>>
>>> If you guys had this kind of problem in the past and found better
>>> solutions, I'd be thankful to hear about it.
>>>
>>> Thanks.
>>>
>>> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber
>>> <jmschreibe...@gmail.com>
>>> escreveu:
>>>
>>>> Hi Rafael
>>>>
>>>> When the tree needs to make a prediction, it usually goes through
>>>> the predict method, then the apply method, then the _apply_dense
>>>> method (this helps partition between dense and sparse data).
>>>>
>>>> Take a look at lines 3463 to 3503, the _apply_dense method. This
>>>> ends up returning an array of offsets to the predict method, where
>>>> each offset is the leaf node a point falls under. The predict method
>>>> then indexes the value array (where node prediction values) are
>>>> stored by this offset array, assigning a prediction value to each point.
>>>>
>>>> A small source of confusion is that for regression trees, the value
>>>> array is one value per output per node, which makes sense. However,
>>>> for classification trees, the value array stores the number of
>>>> training points for each class for each output for each node. For
>>>> example, a regression tree may have 2.5 as the prediction value in a
>>>> leaf, but a classification tree may have [3, 40, 5] as the value in a leaf 
>>>> if there are three classes.
>>>> The final prediction uses argmax to select class 1.
>>>>
>>>> Let me know if you have any other questions!
>>>>
>>>> Jacob
>>>>
>>>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini <
>>>> rafael.calsaver...@gmail.com> wrote:
>>>>
>>>>> Hi there all,
>>>>>
>>>>> I'm taking a look on the code for decision trees and trying to
>>>>> understand how it actually decides the class and I'm having some
>>>>> trouble with the final step.
>>>>>
>>>>> The heart of the algorithm seem to be on lines 3249 to 3260 of the
>>>>> sklearn/tree/_tree.pyx file.
>>>>>
>>>>> Lines 3249 to 3258 are fine, they are just the standard walking
>>>>> through the branchs on the decision trees. What I failed to
>>>>> understand is how the tree actually decides which class to assign
>>>>> to the sample being classified after it reaches a leaf node.
>>>>> Aren't the final classes assigned to each final branch stored anywhere?
>>>>>
>>>>> Thanks,
>>>>> Rafael Calsaverini
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> -----------
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>> --------------------------------------------------------------------
>>>> ---------- _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>> ----------------------------------------------------------------------
>> -------- _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to