Re: [Scikit-learn-general] random forest question

2012-10-28 Thread Andreas Mueller
On 10/28/2012 01:52 AM, Joseph Turian wrote: > Actually, it turns out I was incorrect. > > According to the docs: > http://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees > > "each tree in the ensemble is built from a sample drawn with > replacement (i.e., a bootstrap samp

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Joseph Turian
Actually, it turns out I was incorrect. According to the docs: http://scikit-learn.org/stable/modules/ensemble.html#forests-of-randomized-trees "each tree in the ensemble is built from a sample drawn with replacement (i.e., a bootstrap sample) from the training set. In addition, when splitting a

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Richard T. Guy
That explains the confusion! Thanks, guys. Tommy On Sat, Oct 27, 2012 at 5:25 AM, Joseph Turian wrote: > Gilles, > > I met Tommy Guy at the pydata conference today. > If I remember correctly, Brian Eoff (I don't have his email address) > errantly said that random forests partitions/samples the

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Joseph Turian
Gilles, I met Tommy Guy at the pydata conference today. If I remember correctly, Brian Eoff (I don't have his email address) errantly said that random forests partitions/samples the features before creating each tree. I didn't want to correct him in front of the audience, and it slipped my mind to

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Gilles Louppe
Hi, > I know the speaker at pydata today claimed that the features are > partitioned, Can you elaborate? If you pick your features prior to the construction of the tree and then build it on that subset only, then indeed, this is not random forest. That algorithm is called Random Subspaces. Best,

Re: [Scikit-learn-general] random forest question

2012-10-27 Thread Joseph Turian
> So the short answer is no. All features will be considered when > building a decision tree, as it should. Tommy, I know the speaker at pydata today claimed that the features are partitioned, but I don't believe this to be the case in how random forests were originally specified. Best, Josep

Re: [Scikit-learn-general] random forest question

2012-10-26 Thread Fred Mailhot
On 26 October 2012 16:58, Richard T. Guy wrote: > Hey Scikit-Learn, > > I've been working on some changes to the RandomForest code and I had a > few questions. > > First, it looks like the function > def _partition_features(forest, n_total_features): > partitions features evenly across cores. Am

Re: [Scikit-learn-general] random forest question

2012-10-26 Thread Gilles Louppe
Hello Richard, > First, it looks like the function > def _partition_features(forest, n_total_features): > partitions features evenly across cores. Am I reading that correctly? Yes, this is an internal function used to assign blocks of features to cores. > If so, does this mean that on 2 cores wi

[Scikit-learn-general] random forest question

2012-10-26 Thread Richard T. Guy
Hey Scikit-Learn, I've been working on some changes to the RandomForest code and I had a few questions. First, it looks like the function def _partition_features(forest, n_total_features): partitions features evenly across cores. Am I reading that correctly? If so, does this mean that on 2 cores