You don't have to time to make it compile on windows but you have time 
to write it from scratch?
I'm sure they'd appreciate a patch for windows-compatibility.


On 08/13/2015 01:30 PM, Dale Smith wrote:
> sklearn-compiledtrees is not usable on Windows without some work. I didn't 
> have time to get it to work.
>
>
> Dale Smith, Ph.D.
> Data Scientist
> ​
>
>
> d. 404.495.7220 x 4008   f. 404.795.7221
> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 
> 30305
>
>      
>
> -----Original Message-----
> From: Andreas Mueller [mailto:t3k...@gmail.com]
> Sent: Thursday, August 13, 2015 1:28 PM
> To: scikit-learn-general@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] Question on the code for Decision Trees
>
> For C you should definitely check out this:
> https://github.com/ajtulloch/sklearn-compiledtrees/
>
> It's linked here btw ;)
> http://scikit-learn.org/dev/related_projects.html
>
> On 08/13/2015 01:04 PM, Simon Burton wrote:
>> Surprisingly, I am working on a similar code generation project, with
>> the target language being C. One of the reasons I chose to use
>> decision trees (& ensembles there-of) was that it should be easy to
>> code gen these things & deploy.
>>
>>
>>
>> On Wed, 12 Aug 2015 11:46:18 +0000
>> Rafael Calsaverini <rafael.calsaver...@gmail.com> wrote:
>>
>>> Hum, I see. So, those values aren't available from the
>>> DecisionTreeClassifier class, is that right?
>>>
>>> Let me make more clear what I'm trying to do, maybe you guys have had
>>> this problem in the past and can devise better solutions. I need to
>>> embed a classifier in an external code, which is a proof-of-concept.
>>> There's a few constraints on how much of that code I have freedom to
>>> change, so what seems to be the more productive approach is to do the 
>>> following:
>>>
>>> 1) Train/optimize hyperparameters/cross-validate the model with
>>> scikit-learn until I have a decent initial model.
>>> 2) Implement at the target (probably Java, but could be python) only
>>> the part of the code that does the prediction with hard-coded
>>> parameters copied from the scikit-learn model.
>>>
>>> So, for instance, I can train a RandomForestClassifier in
>>> scikit-learn and then just implement a simple decision function in
>>> the Java code, with all the trees hard-coded (basically just a list
>>> of thresholds, features, left and right children and the final class
>>> decision for each leaf node, and a method to run the decisions and
>>> report the same result that predict_proba would).
>>>
>>> I can already retrieve most of the needed parameters from the
>>> DecisionTreeClassifier (namely: thresholds, left and right children,
>>> and the feature index for each node). Is the example count for each
>>> class for each node doesn't seem to be externally available? If not I
>>> can just do a "manual" count, but it would help.
>>>
>>> The main problem is: I can't just serialize the final trained model
>>> and load it every time. It would involve more change in the final
>>> code than I'm allowed to do (reading the serialized model every time
>>> would be a huge overhead and to avoid it I'd have to change code that
>>> is beyond the scope of what we're willing to change in the short
>>> term). Another problem is that the platform is running in a JVM
>>> language, so probably I'll implement that hard-coded predictor in
>>> that language. I could get away with python if the dev team decide to
>>> use apache thrift for communication but that is currently not a 100% sure 
>>> thing.
>>>
>>> If you guys had this kind of problem in the past and found better
>>> solutions, I'd be thankful to hear about it.
>>>
>>> Thanks.
>>>
>>> Em qua, 12 de ago de 2015 às 04:58, Jacob Schreiber
>>> <jmschreibe...@gmail.com>
>>> escreveu:
>>>
>>>> Hi Rafael
>>>>
>>>> When the tree needs to make a prediction, it usually goes through
>>>> the predict method, then the apply method, then the _apply_dense
>>>> method (this helps partition between dense and sparse data).
>>>>
>>>> Take a look at lines 3463 to 3503, the _apply_dense method. This
>>>> ends up returning an array of offsets to the predict method, where
>>>> each offset is the leaf node a point falls under. The predict method
>>>> then indexes the value array (where node prediction values) are
>>>> stored by this offset array, assigning a prediction value to each point.
>>>>
>>>> A small source of confusion is that for regression trees, the value
>>>> array is one value per output per node, which makes sense. However,
>>>> for classification trees, the value array stores the number of
>>>> training points for each class for each output for each node. For
>>>> example, a regression tree may have 2.5 as the prediction value in a
>>>> leaf, but a classification tree may have [3, 40, 5] as the value in a leaf 
>>>> if there are three classes.
>>>> The final prediction uses argmax to select class 1.
>>>>
>>>> Let me know if you have any other questions!
>>>>
>>>> Jacob
>>>>
>>>> On Tue, Aug 11, 2015 at 2:17 PM, Rafael Calsaverini <
>>>> rafael.calsaver...@gmail.com> wrote:
>>>>
>>>>> Hi there all,
>>>>>
>>>>> I'm taking a look on the code for decision trees and trying to
>>>>> understand how it actually decides the class and I'm having some
>>>>> trouble with the final step.
>>>>>
>>>>> The heart of the algorithm seem to be on lines 3249 to 3260 of the
>>>>> sklearn/tree/_tree.pyx file.
>>>>>
>>>>> Lines 3249 to 3258 are fine, they are just the standard walking
>>>>> through the branchs on the decision trees. What I failed to
>>>>> understand is how the tree actually decides which class to assign
>>>>> to the sample being classified after it reaches a leaf node.
>>>>> Aren't the final classes assigned to each final branch stored anywhere?
>>>>>
>>>>> Thanks,
>>>>> Rafael Calsaverini
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> -----------
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>> --------------------------------------------------------------------
>>>> ---------- _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>> ----------------------------------------------------------------------
>> -------- _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to