Re: [Scikit-learn-general] ValueError: numpy.dtype has the wrong size, try recompiling

2016-02-25 Thread josef.pktd
On Thu, Feb 25, 2016 at 12:34 PM, Laura Fava wrote: > I installed all the packages using pip install. I already had numpy and > scipy installed, but when installing scikit-learn didn't work, I > uninstalled scikit-learn, numpy and scipy, then reinstalled scipy, which >

Re: [Scikit-learn-general] Critical Difference Diagram

2015-11-01 Thread josef.pktd
Just specific to Nemenyi and Dunns tests, I didn't check the other parts of this discussion. They were discussed here https://github.com/statsmodels/statsmodels/issues/852 (starting after a few comments) with code available in gists but not yet in a PR for statsmodels Josef On Sat, Oct 31,

Re: [Scikit-learn-general] MICE Imputation for SciKit Learn

2015-10-23 Thread josef.pktd
On Fri, Oct 23, 2015 at 9:44 AM, Andy wrote: > Hi Ouwen. > I think this looks interesting, and it would be good to have more > non-trivial imputation methods. > > Is anyone familiar with the method? I don't have time to go into the > details of the paper at the moment. >

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread josef.pktd
On Mon, Oct 5, 2015 at 6:15 PM, Sturla Molden wrote: > On 04/10/15 05:07, George Bezerra wrote: > > > I am trying to follow this paper: > > > http://research.microsoft.com/en-us/um/people/mattri/papers/www2007/predictingclicks.pdf > > (check out section 6.2). They use

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-05 Thread josef.pktd
On Mon, Oct 5, 2015 at 10:05 PM, Sturla Molden wrote: > On 06/10/15 00:35, josef.p...@gmail.com wrote: > > > rate in the sense of proportion is between zero and 1. > > Rate usually refers to "events per unit of time or exposure", so we can > either count events in

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread josef.pktd
On Sat, Oct 3, 2015 at 11:54 PM, George Bezerra wrote: > Thanks a lot Josef. I guess it is possible to do what I wanted, though > maybe not in scikit. Does the statsmodels version allow l1 or l2 > regularization? I'm planning to use a lot of features and let the model >

Re: [Scikit-learn-general] Using logistic regression on a continuous target variable

2015-10-03 Thread josef.pktd
Just to come in here as an econometrician and statsmodels maintainer. statsmodels intentionally doesn't enforce binary data for Logit or similar models, any data between 0 and 1 is fine. Logistic Regression/Logit or similar Binomial/Bernoulli models can consistently estimate the expected value

Re: [Scikit-learn-general] scikit-learn Truck Factor

2015-08-12 Thread josef.pktd
On Wed, Aug 12, 2015 at 9:45 AM, Joel Nothman joel.noth...@gmail.com wrote: I find that list somewhat obscure, and reading your section on Code Authorship gives me some sense of why. All of those people have been very important contributors to the project, and I'd think the absence of Gaël,

Re: [Scikit-learn-general] scikit-learn Truck Factor

2015-08-12 Thread josef.pktd
On Wed, Aug 12, 2015 at 9:00 AM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Wed, Aug 12, 2015 at 1:57 PM, Guilherme Avelino gavel...@gmail.com wrote: As part of my PhD research on code authorship, we calculated the Truck Factor (TF) of some popular GitHub repositories. As you

Re: [Scikit-learn-general] Possible code contribution (Poisson loss)

2015-07-28 Thread josef.pktd
Just a comment from the statistics sidelines taking log of target and fitting a linear or other model doesn't make it into a Poisson model. But maybe Poisson loss in machine learning is unrelated to the Poisson distribution or a Poisson model with E(y| x) = exp(x beta). ? Josef On Tue, Jul

Re: [Scikit-learn-general] Cohen's Kappa

2015-07-14 Thread josef.pktd
On Tue, Jul 14, 2015 at 8:30 AM, Herbert Schulz hrbrt@gmail.com wrote: Hey, is there a function in scikit-learn to get the cohen's kappa? there is in statsmodels http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.inter_rater.cohens_kappa.html Josef best, Herb

Re: [Scikit-learn-general] Dramatic improvement by standardizing data?

2015-04-29 Thread josef.pktd
On Wed, Apr 29, 2015 at 11:13 AM, Fabrizio Fasano han...@gmail.com wrote: Dear experts, I’m experiencing a dramatic improvement in cross-validation when data are standardised I mean accuracy increased from 48% to 100% when I shift from X to X_scaled = preprocessing.scale(X) Does it make

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 2:38 PM, Luca Puggini lucapug...@gmail.com wrote: Totally true Josef but I guess that shoesize should not contain more information than age. I was hoping to do not classify it as relevant when age is in the model. Semi-OT for the random forest question I thought about

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 9:26 AM, Alan G Isaac alan.is...@gmail.com wrote: It seems unlikely that the choice of which features to provide should turn entirely on controversial philosophical positions. Hopefully a feature can be declared in or out of scope for the project on technical grounds.

Re: [Scikit-learn-general] random forest importance and correlated variables.

2015-04-19 Thread josef.pktd
On Sun, Apr 19, 2015 at 10:05 AM, Gilles Louppe g.lou...@gmail.com wrote: Hi Luca, If you want to find all relevant features, I would recommend using ExtraTreesClassifier with max_features=1 and limited depth in order to avoid this kind of bias due to estimation errors. E.g., try with

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 9:25 PM, Sturla Molden sturla.mol...@gmail.com wrote: josef.p...@gmail.com wrote: Re. We should therefore never compute p-values: I assume that you meant that within the narrow context of regression, and not, e.g., in the context of tests of distribution. Sturla

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 6:40 PM, Phillip Feldman phillip.m.feld...@gmail.com wrote: This is a very nice explanation. Thanks!! Re. We should therefore never compute p-values: I assume that you meant that within the narrow context of regression, and not, e.g., in the context of tests of

Re: [Scikit-learn-general] logistic regression: need p-values

2015-04-18 Thread josef.pktd
On Sat, Apr 18, 2015 at 9:45 PM, Sturla Molden sturla.mol...@gmail.com wrote: josef.p...@gmail.com wrote: (I just went through some articles to see how we can produce p-values after feature selection with penalized least squares or maximum penalized likelihood. :) If you have used penalized

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread josef.pktd
On Mon, Aug 18, 2014 at 12:15 PM, Olivier Grisel olivier.gri...@ensta.org wrote: Le 18 août 2014 16:16, Sebastian Raschka se.rasc...@gmail.com a écrit : On Aug 18, 2014, at 3:46 AM, Olivier Grisel olivier.gri...@ensta.org wrote: But the sklearn.cross_validation.Bootstrap currently

Re: [Scikit-learn-general] bootstrap depracation warning

2014-08-18 Thread josef.pktd
On Mon, Aug 18, 2014 at 12:43 PM, Olivier Grisel olivier.gri...@ensta.org wrote: 2014-08-18 18:28 GMT+02:00 josef.p...@gmail.com: On Mon, Aug 18, 2014 at 12:15 PM, Olivier Grisel olivier.gri...@ensta.org wrote: Le 18 août 2014 16:16, Sebastian Raschka se.rasc...@gmail.com a