It sounds like you might want an ensemble of classifiers, where you have a
different classifier for each category in A, if you know you want to split
on A like that a priori. That way the classifier would learn some function
on B which maps B to VALUE, and learns this function independently for each
category in A.

On Thu, Jul 2, 2015 at 12:18 PM, Rex <dnsr...@gmail.com> wrote:

> Sebastian, my question is similar to yours but somehow different.
>
> For my case, I want to find out the combined conditions leading to some
> *clustering* pattern.
>
> For example, given four columns, [A, B, ID, VALUE], say A is a categorical
> attribute, B is some integer number, and "VALUE" is the target continuous
> value.
> A can be any categorical value of [a, b, c, d, e], - we wish to split from
> this category first
> B can be any integer value from 1 to 10^6,
> ID can be any integer from 1000 to 999,
> and "VALUE" is a real number from 0 to 1.0 - the bigger the more important
> to us.
>
> Let's say, given 600 samples in total, there are 200 ID's under category A
> = "e" and 100< = B <= 1000, and 70% of them have VALUE >= 0.5. We conclude
> that this logic leads to a positive signal. (A = "e" and 100< = B <= 1000
> here).
>
> We want to find out such "clusters", and the combined rules leading to
> them.
>
> The first idea coming to my mind is DecisionTree. But as Andreas, Dale,
> and Jacob mentioned, a *supervised* first-level split is not natural to a
> decision tree.
>
> What is the right algo to handle such a case?
>
>
>
>
> On Wed, Jul 1, 2015 at 12:16 PM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> Yes, I could do sequential backward selection in combination with a
>> linear regression model, however, that would be essentially the same as the
>> decision tree approach using MSE as objective function to be minimized at
>> each split. Thanks for the input though, I have to brainstorm about it a
>> little bit more.
>>
>> On Jul 1, 2015, at 3:00 PM, Jacob Schreiber <jmschreibe...@gmail.com>
>> wrote:
>>
>> If you are working with entirely binary data, then features will not be
>> repeated in the tree naturally. I think you are discussing the more general
>> field of 'feature selection', though. There are a plethora of algorithms
>> which do that--try to identify which inputs are important to a correct
>> prediction. You can read more here:
>> http://scikit-learn.org/stable/modules/feature_selection.html
>>
>> On Wed, Jul 1, 2015 at 9:45 AM, Sebastian Raschka <se.rasc...@gmail.com>
>> wrote:
>>
>>> Yes, and thanks for the answers, it was just a random idea.
>>>
>>> But in all seriousness, which algorithm would you use for such a task --
>>> here, the goal is not predictive performance but rather "inference":
>>>
>>> I am collaborating with experimentalists who obtained measurements on a
>>> continuous scale 0.0 - 1.0, and each sample has ~30 binary features. They
>>> basically want to "learn" from this data, for example, which combination of
>>> features was "important" to yield a response >= 0.5 (although this
>>> threshold is not fixed)
>>> For example, using a decision tree, you could come up with something like
>>>
>>> If feature A=1 --> response > 0.5
>>>     If feature B=0 --> response > 0.6
>>>          If feature C=1  ---> response > 0.7
>>> etc.
>>>
>>> Basically, an association rule mining but with continuous outputs.
>>>
>>> On Jul 1, 2015, at 12:34 PM, Dale Smith <dsm...@nexidia.com> wrote:
>>>
>>> It is a crazy idea. It defeats the purpose of random forest, which is
>>> introducing randomness in specific ways in order to achieve certain goals.
>>> Your idea, while appropriate in your use case, does not fit with the
>>> algorithm you want to use. Why not investigate alternatives that better fit
>>> your use case?
>>>
>>>
>>> *Dale Smith, Ph.D.*
>>> Data Scientist
>>> ​
>>> <image001.png> <http://nexidia.com/>
>>>
>>> *d.* 404.495.7220 x 4008   *f.* 404.795.7221
>>> Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 |
>>> Atlanta, GA 30305
>>>
>>> <image002.jpg> <http://blog.nexidia.com/> <image003.jpg>
>>> <https://www.linkedin.com/company/nexidia> <image004.jpg>
>>> <https://plus.google.com/u/0/107921893643164441840/posts> <image005.jpg>
>>> <https://twitter.com/Nexidia> <image006.jpg>
>>> <https://www.youtube.com/user/NexidiaTV>
>>>
>>> *From:* Sebastian Raschka [mailto:se.rasc...@gmail.com
>>> <se.rasc...@gmail.com>]
>>> *Sent:* Wednesday, July 01, 2015 12:17 PM
>>> *To:* scikit-learn-general@lists.sourceforge.net
>>> *Subject:* Re: [Scikit-learn-general] Is it possible to specify the
>>> order of spliting in decision tree with scikit-learn?
>>>
>>> Maybe a crazy idea, but what I think could be useful is to have
>>> something like a "repeat_features" parameter that can be set to `False` to
>>> not reuse features down the tree.
>>>
>>> E.g., let's say we have 1000 different drug molecules with certain
>>> chemical groups and have some sort of experimental data of whether they
>>> work or not. Using decision tree classification/regression without feature
>>> repetition could help to interpret which of the functional groups may be
>>> important -- here the focus is maybe not so much predictive performance but
>>> rather interpretability, something like "supervised" clustering.
>>>
>>>
>>>
>>> On Jul 1, 2015, at 11:08 AM, Andreas Mueller <t3k...@gmail.com> wrote:
>>>
>>>
>>> Not really, at that kind of defeats the purpose of learning the tree.
>>> you could built a series of stumps that first only get feature a, then
>>> feature b and then feature c.
>>> On 06/30/2015 11:37 PM, Rex wrote:
>>>
>>> Given three columns, ["A", "B", "C"], can we specify the order of
>>> splitting, so that it firstly split on categories of "A", then "B", and
>>> then by others?
>>>
>>> Based on on documentation page on DecisionTreeClassifier, there is no
>>> such option. Is there any way to work it out?
>>>
>>>
>>> http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> Don't Limit Your Business. Reach for the Cloud.
>>>
>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>>
>>> you need to offload your IT needs and focus on growing your business.
>>>
>>> Configured For All Businesses. Start Your Cloud Today.
>>>
>>> https://www.gigenetcloud.com/
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>>
>>> Scikit-learn-general mailing list
>>>
>>> Scikit-learn-general@lists.sourceforge.net
>>>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't Limit Your Business. Reach for the Cloud.
>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>> you need to offload your IT needs and focus on growing your business.
>>> Configured For All Businesses. Start Your Cloud Today.
>>>
>>> https://www.gigenetcloud.com/_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't Limit Your Business. Reach for the Cloud.
>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>> you need to offload your IT needs and focus on growing your business.
>>> Configured For All Businesses. Start Your Cloud Today.
>>>
>>> https://www.gigenetcloud.com/_______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Don't Limit Your Business. Reach for the Cloud.
>>> GigeNET's Cloud Solutions provide you with the tools and support that
>>> you need to offload your IT needs and focus on growing your business.
>>> Configured For All Businesses. Start Your Cloud Today.
>>> https://www.gigenetcloud.com/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>>
>> https://www.gigenetcloud.com/_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Don't Limit Your Business. Reach for the Cloud.
>> GigeNET's Cloud Solutions provide you with the tools and support that
>> you need to offload your IT needs and focus on growing your business.
>> Configured For All Businesses. Start Your Cloud Today.
>> https://www.gigenetcloud.com/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Don't Limit Your Business. Reach for the Cloud.
> GigeNET's Cloud Solutions provide you with the tools and support that
> you need to offload your IT needs and focus on growing your business.
> Configured For All Businesses. Start Your Cloud Today.
> https://www.gigenetcloud.com/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to