Webinar signup:
Gradient Boosting, Tree Ensembles and Classification Trees: A Winning
Combination
November 9, 10-11 a.m., PST.
Webinar Registration:
http://2.salford-systems.com/gradientboosting/
Understand major shortcomings of using only decision trees and how tree
ensembles can help o
Yes, I just realized that it doesn't work out unless you divide by the std.
It seems like using the population or sample standard deviation is not
important in this case since it's not easy to get the unbiased sample std.
I came across some other techniques for scaling described in section "Class
On Tue, Nov 6, 2012 at 4:17 PM, Doug Coleman wrote:
> Actually, from the numpy docs, the ddof=1 for np.std doesn't make it
> unbiased. There's a whole wikipedia article on calculating the unbiased
> standard deviation, and it seems to be different for the normal distribution
> than for others and
Actually, from the numpy docs, the ddof=1 for np.std doesn't make it
unbiased. There's a whole wikipedia article on calculating the unbiased
standard deviation, and it seems to be different for the normal
distribution than for others and involves the gamma function--the advice
from the wiki is not
> b) You shouldn't set max_depth=5. Instead, build fully developed trees
> (max_depth=None) or rather tune min_samples_split using
> cross-validation.
Dear Gilles,
I have set up a grid search:
"
tuned_parameters = [{'min_samples_split': [1,2,3,4,5,6,7,8,9]}]
scores = [('precision', precision_sc
Dear Gilles,
> Hi Paul,
>
> a) Scaling has no effect on decision trees.
Thanks!
>
> b) You shouldn't set max_depth=5. Instead, build fully developed trees
> (max_depth=None) or rather tune min_samples_split using
> cross-validation.
Do fully developed trees make sense for rather small datasets?
Hi Paul,
a) Scaling has no effect on decision trees.
b) You shouldn't set max_depth=5. Instead, build fully developed trees
(max_depth=None) or rather tune min_samples_split using
cross-validation.
Hope this helps.
Gilles
On 6 November 2012 16:21, wrote:
>
> ear SciKitters,
>
> given a rathe
ear SciKitters,
given a rather unbalanced data set (454 samples with classification "0" and
168 samples with classification "1"), I would like to train a RandomForest.
For my data set, I have calculated 177 features per sample.
In a first step, I have preprocessed my data set:
"
dataDescrs_array
2012/11/6 Olivier Grisel :
> None, False: no stdev
> True, "pop": population stdev
> "sample": sample stdev
>
> +1 but with "population" instead of "pop".
Alright :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
None, False: no stdev
True, "pop": population stdev
"sample": sample stdev
+1 but with "population" instead of "pop".
2012/11/6 Lars Buitinck :
> 2012/11/6 Gael Varoquaux :
>> That said, I am OK adding an additional parameter, if people think that
>> it is important. The one used in numpy, "ddof"
2012/11/6 Gael Varoquaux :
> That said, I am OK adding an additional parameter, if people think that
> it is important. The one used in numpy, "ddof", is somewhat cryptic,
> though.
How about overloading with_std to take...
None, False: no stdev
True, "pop": population stdev
"sample": sample stde
On Tue, Nov 6, 2012 at 6:48 AM, Gael Varoquaux
wrote:
> I am actually -1 on this, because the consequence would be that np.std(X,
> axis=-1) would no longer be one. I am afraid that it would confuse the
> users.
>
> I believe that the n/(n - 1) difference is completely irrelevent for
> machine lea
2012/11/6 Mathieu Blondel :
> On Tue, Nov 6, 2012 at 9:33 AM, Abhi wrote:
>>
>> Hello,
>>I have been reading and testing examples around the sklearn
>> documentation and
>> am not too clear on few things and would appreciate any help regarding
>> the
>> following questions:
>> 1) What would
On Mon, Nov 05, 2012 at 11:37:13PM +0100, Lars Buitinck wrote:
> This test seems to call np.dot on two scipy.sparse matrices (both of
> dtype=float64, so the error message is quite confusing). IIRC, np.dot
> support for sparse matrices broke in recent Numpy versions, so we
> really shouldn't be doi
On Tue, Nov 06, 2012 at 04:18:25PM +0900, Mathieu Blondel wrote:
> 1) What would be the advantage of training LogisticRegression vs
> OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
> the latter would basically train n_classes classifiers).
> They actually do t
we should probably improve the docs on the ovr. iirc the user guide was already
very explicit, maybe add something to the docstring?
abhi: did you read the user guide on the one vs rest classifier? how could we
improve it to make things more clear?
Mathieu Blondel schrieb:
>On Tue, Nov 6, 20
16 matches
Mail list logo