Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
Sorry... I mean penalized likelihood, not large weight penalization. Here's the reference I was thinking of http://m.statisticalhorizons.com/?task=get&pageid=1424858329 On Thu, Dec 15, 2016 at 9:12 PM wrote: > just some generic comments, I don't have any experience with penalized > estimation n

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread josef . pktd
just some generic comments, I don't have any experience with penalized estimation nor did I go through the math. In unregularized Logistis Regression or Logit and in several other models the estimator satisfies some aggregation properties so that in sample or training set proportions match between

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
Here's a discussion http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression See the Zheng and King reference. It would be nice to have these methods in scikit. On Thu, Dec 15, 2016 at 7:05 PM Rachel Melamed wrote: > > > > > > > > > > > St

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Rachel Melamed
Stuart, Yes the data is quite imbalanced (this is what I meant by p(success) < .05 ) To be clear, I calculate \sum_i \hat{y_i} = logregN.predict_proba(design)[:,1]*(success_fail.sum(axis=1)) and compare that number to the observed number of success. I find the predicted number to always be higher

Re: [scikit-learn] Model checksums

2016-12-15 Thread Stuart Reynolds
I don't mean that scikit-learn's modeling is non-deterministic -- I mean the pickle library. Same input different serialized bytes output. It was my recollection that dictionaries were inconsistently ordered when serialized, or some the object ID was included in the serialization -- anyhow I don't

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sean Violante
Sorry just saw you are not using the liblinear solver, agree with Sebastian, you should subtract mean not median On 15 Dec 2016 11:02 pm, "Sean Violante" wrote: > The problem is the (stupid!) liblinear solver that also penalises the > intercept (in regularisation) . Use a different solver or

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sean Violante
The problem is the (stupid!) liblinear solver that also penalises the intercept (in regularisation) . Use a different solver or change the intercept_scaling parameter On 15 Dec 2016 10:44 pm, "Sebastian Raschka" wrote: > Subtracting the median wouldn’t result in normalizing the usual sense, >

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sebastian Raschka
Subtracting the median wouldn’t result in normalizing the usual sense, since subtracting a constant just shifts the values by a constant. Instead, for logistic regression & most optimizers, I would recommend subtracting the mean to center the features at mean zero and divide by the standard devi

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Stuart Reynolds
LR is biased with imbalanced datasets. Is your dataset unbalanced? (e.g. is there one class that has a much smaller prevalence in the data that the other)? On Thu, Dec 15, 2016 at 1:02 PM, Rachel Melamed wrote: > I just tried it and it did not appear to change the results at all? > I ran it as f

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Rachel Melamed
I just tried it and it did not appear to change the results at all? I ran it as follows: 1) Normalize dummy variables (by subtracting median) to make a matrix of about 1 x 5 2) For each of the 1000 output variables: a. Each output variable uses the same dummy variables, but not all settings o

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Alexey Dral
Could you try to normalize dataset after feature dummy encoding and see if it is reproducible behavior? 2016-12-15 22:03 GMT+03:00 Rachel Melamed : > Thanks for the reply. The covariates (“X") are all dummy/categorical > variables. So I guess no, nothing is normalized. > > On Dec 15, 2016, at 1

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Rachel Melamed
Thanks for the reply. The covariates (“X") are all dummy/categorical variables. So I guess no, nothing is normalized. On Dec 15, 2016, at 1:54 PM, Alexey Dral mailto:aad...@gmail.com>> wrote: Hi Rachel, Do you have your data normalized? 2016-12-15 20:21 GMT+03:00 Rachel Melamed mailto:mela

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Alexey Dral
Hi Rachel, Do you have your data normalized? 2016-12-15 20:21 GMT+03:00 Rachel Melamed : > Hi all, > Does anyone have any suggestions for this problem: > http://stackoverflow.com/questions/41125342/sklearn- > logistic-regression-gives-biased-results > > I am running around 1000 similar logistic

[scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Rachel Melamed
Hi all, Does anyone have any suggestions for this problem: http://stackoverflow.com/questions/41125342/sklearn-logistic-regression-gives-biased-results I am running around 1000 similar logistic regressions, with the same covariates but slightly different data and response variables. All of my re