[Scikit-learn-general] Webinar signup: Gradient Boosting and Classification Trees: A Winning Combination. November 9, 10-11 a.m., PST

2012-11-06 Thread Lisa Solomon
Webinar signup: Gradient Boosting, Tree Ensembles and Classification Trees: A Winning Combination November 9, 10-11 a.m., PST. Webinar Registration: http://2.salford-systems.com/gradientboosting/ Understand major shortcomings of using only decision trees and how tree ensembles can help o

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Doug Coleman
Yes, I just realized that it doesn't work out unless you divide by the std. It seems like using the population or sample standard deviation is not important in this case since it's not easy to get the unbiased sample std. I came across some other techniques for scaling described in section "Class

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Robert Kern
On Tue, Nov 6, 2012 at 4:17 PM, Doug Coleman wrote: > Actually, from the numpy docs, the ddof=1 for np.std doesn't make it > unbiased. There's a whole wikipedia article on calculating the unbiased > standard deviation, and it seems to be different for the normal distribution > than for others and

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Doug Coleman
Actually, from the numpy docs, the ddof=1 for np.std doesn't make it unbiased. There's a whole wikipedia article on calculating the unbiased standard deviation, and it seems to be different for the normal distribution than for others and involves the gamma function--the advice from the wiki is not

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
> b) You shouldn't set max_depth=5. Instead, build fully developed trees > (max_depth=None) or rather tune min_samples_split using > cross-validation. Dear Gilles, I have set up a grid search: " tuned_parameters = [{'min_samples_split': [1,2,3,4,5,6,7,8,9]}] scores = [('precision', precision_sc

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
Dear Gilles, > Hi Paul, > > a) Scaling has no effect on decision trees. Thanks! > > b) You shouldn't set max_depth=5. Instead, build fully developed trees > (max_depth=None) or rather tune min_samples_split using > cross-validation. Do fully developed trees make sense for rather small datasets?

Re: [Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Gilles Louppe
Hi Paul, a) Scaling has no effect on decision trees. b) You shouldn't set max_depth=5. Instead, build fully developed trees (max_depth=None) or rather tune min_samples_split using cross-validation. Hope this helps. Gilles On 6 November 2012 16:21, wrote: > > ear SciKitters, > > given a rathe

[Scikit-learn-general] RF optimisation - class weights etc.

2012-11-06 Thread Paul . Czodrowski
ear SciKitters, given a rather unbalanced data set (454 samples with classification "0" and 168 samples with classification "1"), I would like to train a RandomForest. For my data set, I have calculated 177 features per sample. In a first step, I have preprocessed my data set: " dataDescrs_array

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Lars Buitinck
2012/11/6 Olivier Grisel : > None, False: no stdev > True, "pop": population stdev > "sample": sample stdev > > +1 but with "population" instead of "pop". Alright :) -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Olivier Grisel
None, False: no stdev True, "pop": population stdev "sample": sample stdev +1 but with "population" instead of "pop". 2012/11/6 Lars Buitinck : > 2012/11/6 Gael Varoquaux : >> That said, I am OK adding an additional parameter, if people think that >> it is important. The one used in numpy, "ddof"

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Lars Buitinck
2012/11/6 Gael Varoquaux : > That said, I am OK adding an additional parameter, if people think that > it is important. The one used in numpy, "ddof", is somewhat cryptic, > though. How about overloading with_std to take... None, False: no stdev True, "pop": population stdev "sample": sample stde

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

2012-11-06 Thread Robert Kern
On Tue, Nov 6, 2012 at 6:48 AM, Gael Varoquaux wrote: > I am actually -1 on this, because the consequence would be that np.std(X, > axis=-1) would no longer be one. I am afraid that it would confuse the > users. > > I believe that the n/(n - 1) difference is completely irrelevent for > machine lea

Re: [Scikit-learn-general] OvR, Logistic Regression and SGD

2012-11-06 Thread Olivier Grisel
2012/11/6 Mathieu Blondel : > On Tue, Nov 6, 2012 at 9:33 AM, Abhi wrote: >> >> Hello, >>I have been reading and testing examples around the sklearn >> documentation and >> am not too clear on few things and would appreciate any help regarding >> the >> following questions: >> 1) What would

Re: [Scikit-learn-general] Current HEAD test failure

2012-11-06 Thread Gael Varoquaux
On Mon, Nov 05, 2012 at 11:37:13PM +0100, Lars Buitinck wrote: > This test seems to call np.dot on two scipy.sparse matrices (both of > dtype=float64, so the error message is quite confusing). IIRC, np.dot > support for sparse matrices broke in recent Numpy versions, so we > really shouldn't be doi

Re: [Scikit-learn-general] OvR, Logistic Regression and SGD

2012-11-06 Thread Gael Varoquaux
On Tue, Nov 06, 2012 at 04:18:25PM +0900, Mathieu Blondel wrote: > 1) What would be the advantage of training LogisticRegression vs > OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand > the latter would basically train n_classes classifiers). > They actually do t

Re: [Scikit-learn-general] OvR, Logistic Regression and SGD

2012-11-06 Thread amueller
we should probably improve the docs on the ovr. iirc the user guide was already very explicit, maybe add something to the docstring? abhi: did you read the user guide on the one vs rest classifier? how could we improve it to make things more clear? Mathieu Blondel schrieb: >On Tue, Nov 6, 20