> I suppose, I will have to test all my variables at every node to find the
> optimum split measured with a criteria. What it's still not clear to me is
> if it exists an elegant way of choosing the right splitter depending on the
> variable, via tagging or any other solution.

Yes, we don't have any clean solution for that at the moment. I am
afraid you will have to hack it yourself for now.

But this is really something that I would like to solve in the near
future. Currently we have the same issue with categorical variables.
We handle them as if they were numerical, which is obviously not a
sane thing to do since you therefore impose an order on the values of
the variable. You can still one-hot-encode the feature into binary
features, as Lars suggested, but then again, this is not always the
best thing to do either. (And this is even more true when you know that
decision trees were originally designed to handle categorical
variables properly...)


>
> Cheers,
> Pablo
>
>
> On 29 January 2014 20:30, Gilles Louppe <g.lou...@gmail.com> wrote:
>>
>> Hi Pablo,
>>
>> I am not sure re-implementing a new criterion is what you are looking
>> for. Criteria are made to evaluate the goodness of a split (i.e., a
>> binary partition of the samples in the current node) in terms of
>> impurity with regards to the output variable - not the inputs.
>>
>> What you should do instead, I think, is to write a new Splitter
>> implementation for partitioning your samples on the basis of circular
>> features. Then, once your samples are partitioned into two subsets,
>> you can evaluate the goodness of the split using one of our (existing)
>> criteria (no matter how you have found this partition).
>>
>> Hope this can help in your design.
>>
>> Gilles
>>
>> On 29 January 2014 10:21, Lars Buitinck <larsm...@gmail.com> wrote:
>> > 2014-01-29 Pablo Rozas Larraondo <p.rozas.larrao...@gmail.com>:
>> >> Suppose I want to create a regression tree accepting both continuous
>> >> linear
>> >> data and circular data. If I implement a new RegressionCriterion
>> >> specific
>> >> for circular data, how difficult would it be to grow a tree combining
>> >> to
>> >> different Criterions (ie MSE and the new CircularCriterion)?
>> >>
>> >> I suppose the main complexity will come having to tag my variables as
>> >> [normal, circular]  to be treated by the appropriate Criterion, but how
>> >> difficult this might be?
>> >
>> > We don't have any convention for tagging different types of data at
>> > the moment, except in preprocessing transformers: all data going to
>> > actual estimators is considered continuous linear for simplicity. The
>> > simple solution would be to replace the circular features with their
>> > sine and cosine after projecting them onto the unit circle.
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > WatchGuard Dimension instantly turns raw network data into actionable
>> > security intelligence. It gives you real-time visual feedback on key
>> > security issues and trends.  Skip the complicated setup - simply import
>> > a virtual appliance and go from zero to informed in seconds.
>> >
>> > http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-general@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> WatchGuard Dimension instantly turns raw network data into actionable
>> security intelligence. It gives you real-time visual feedback on key
>> security issues and trends.  Skip the complicated setup - simply import
>> a virtual appliance and go from zero to informed in seconds.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> WatchGuard Dimension instantly turns raw network data into actionable
> security intelligence. It gives you real-time visual feedback on key
> security issues and trends.  Skip the complicated setup - simply import
> a virtual appliance and go from zero to informed in seconds.
> http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable 
security intelligence. It gives you real-time visual feedback on key
security issues and trends.  Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to