Perhaps you have become aware of this by now,
but only K-1 subset tests are needed to find the best
categorical split, not 2^(K-1)-1. This was a central
result proved in Brieman's book.
--
Download BIRT iHub F-Type
+1
Just wanted to point out that the K-1 subset proof is only true for binary
classification. Such heuristics do perform reasonably for the multiclass
classification criterion though.
On Monday, November 17, 2014, Alexander Hawk tomahawkb...@gmail.com wrote:
Perhaps you have become aware of
On Tue, Jun 4, 2013 at 8:16 PM, Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
I believe more in my results than in my expertise - and so should you :-)
**
+1! There's very very few examples of theory trumping data in history...
And a bajillion of the converse.
I also think Joel put
I believe more in my results than in my expertise - and so should you :-)
+1! There's very very few examples of theory trumping data in history... And
a bajillion of the converse.
I guess I didn't express myself clearly: I didn't mean to say that I
mistrust my results per se.. I'm not that
On 06/04/2013 05:55 AM, Christian Jauvin wrote:
Many thanks to all for your help and detailed answers, I really appreciate it.
So I wanted to test the discussion's takeaway, namely, what Peter
suggested: one-hot encode the categorical features with small
cardinality, and leave the others in
Hi Christian,
I believe more in my results than in my expertise - and so should you :-) **
I think you misunderstood me: I did not claim that one-hot encoded
categorical features give better results than ordinal encoded ones - I just
claimed that ordinal encoding works as good as one-hot encoded
On 06/03/2013 05:19 AM, Joel Nothman wrote:
However, in these last two cases, the number of possible splits at a
single node is linear in the number of categories. Selecting an
arbitrary partition allows exponentially many splits with respect to
the number of categories (though there may
On 06/03/2013 04:41 AM, Christian Jauvin wrote:
Sklearn does not implement any special treatment for categorical variables.
You can feed any float. The question is if it would work / what it does.
I think I'm confused about a couple of aspects (that's what happens I
guess when you play with
On 3 June 2013 08:43, Andreas Mueller amuel...@ais.uni-bonn.de wrote:
On 06/03/2013 05:19 AM, Joel Nothman wrote:
However, in these last two cases, the number of possible splits at a
single node is linear in the number of categories. Selecting an
arbitrary partition allows exponentially many
On 06/03/2013 09:15 AM, Peter Prettenhofer wrote:
Our decision tree implementation only supports numerical splits; i.e.
if tests val threshold .
Categorical features need to be encoded properly. I recommend one-hot
encoding for features with small cardinality (e.g. 50) and ordinal
Many thanks to all for your help and detailed answers, I really appreciate it.
So I wanted to test the discussion's takeaway, namely, what Peter
suggested: one-hot encode the categorical features with small
cardinality, and leave the others in their ordinal form.
So from the same dataset I
I got very good results on text century dating using random forests on
very few (20-ish) bag-of-words tf-idf features selected by chi2. It
depends on the problem.
Cheers,
Vlad
On Sat, Jun 1, 2013 at 9:01 PM, Andreas Mueller
amuel...@ais.uni-bonn.de wrote:
On 06/01/2013 08:30 PM, Christian
Hi Andreas,
Btw, you do encode the categorical variables using one-hot, right?
The sklearn trees don't really support categorical variables.
I'm rather perplexed by this.. I assumed that sklearn's RF only
required its input to be numerical, so I only used a LabelEncoder up
to now.
My
On 06/02/2013 10:53 PM, Christian Jauvin wrote:
Hi Andreas,
Btw, you do encode the categorical variables using one-hot, right?
The sklearn trees don't really support categorical variables.
I'm rather perplexed by this.. I assumed that sklearn's RF only
required its input to be numerical, so
Sklearn does not implement any special treatment for categorical variables.
You can feed any float. The question is if it would work / what it does.
I think I'm confused about a couple of aspects (that's what happens I
guess when you play with algorithms for which you don't have a
complete and
On Mon, Jun 3, 2013 at 12:41 PM, Christian Jauvin cjau...@gmail.com wrote:
Sklearn does not implement any special treatment for categorical
variables.
You can feed any float. The question is if it would work / what it does.
I think I'm confused about a couple of aspects (that's what
Hi Christian,
Some time ago I had similar problems. I.e., I wanted to use additional
features to my lexical features and simple concatanation didn't work
that well for me even though both feature sets on their own performed
pretty well.
You can follow the discussion about my problem here [1]
On 06/01/2013 08:30 PM, Christian Jauvin wrote:
Hi,
I asked a (perhaps too vague?) question about the use of Random
Forests with a mix of categorical and lexical features on two ML
forums (stats.SE and MetaOp), but since it has received no attention,
I figured that it might work better on
18 matches
Mail list logo