Sorry... I mean penalized likelihood, not large weight penalization. Here's the reference I was thinking of http://m.statisticalhorizons.com/?task=get&pageid=1424858329
On Thu, Dec 15, 2016 at 9:12 PM <josef.p...@gmail.com> wrote: > just some generic comments, I don't have any experience with penalized > estimation nor did I go through the math. > > In unregularized Logistis Regression or Logit and in several other models > the estimator satisfies some aggregation properties so that in sample or > training set proportions match between predicted proportions and those of > the sample. > > Regularized estimation does not require unbiased estimation of the > parameters because it maximizes a different objective function, like mean > squared error in the linear model. We are trading off bias against > variance. I think this will propagate to the prediction, but I'm not sure > whether an unpenalized intercept can be made to compensate for the bias in > the average prediction. > > For Logit this would mean that although we have a bias, we have less > variance/variation in the prediction, so overall we are doing better than > with unregularized prediction under the chosen penalization measure. > I assume because the regularization biases towards zero coefficients it > also biases towards a prediction of 0.5, unless it's compensated for by the > intercept. > > I didn't read the King and Zheng (2001) article, but it doesn't mention > penalization or regularization, based on a brief search, so it doesn't seem > to address the regularization bias. (Aside, from the literature I think > many people use a different model than logistic for rare events data, > either Poisson with exponential link or Binomial/Bernoulli with an > asymmetric link function.) > > I think, demeaning could help because it reduces the dependence between > the intercept and the other penalized variables, but because of the > nonlinear model it will not make it orthogonal. > > The question is whether it's possible to improve the estimator by > additionally adjusting the mean or the threshold for 0-1 predictions. It > might depend on the criteria to choose the penalization. I don't know and > have no idea what scikit-learn does. > > Josef > > On Thu, Dec 15, 2016 at 11:30 PM, Stuart Reynolds < > stu...@stuartreynolds.net> wrote: > > Here's a discussion > > > http://stats.stackexchange.com/questions/6067/does-an-unbalanced-sample-matter-when-doing-logistic-regression > > See the Zheng and King reference. > It would be nice to have these methods in scikit. > > > > On Thu, Dec 15, 2016 at 7:05 PM Rachel Melamed <mela...@uchicago.edu> > wrote: > > > > > > > > > > > > Stuart, > > > > Yes the data is quite imbalanced (this is what I meant by p(success) < .05 > ) > > > > > > > > > > > > To be clear, I calculate > > > > > \sum_i \hat{y_i} > = logregN.predict_proba(design)[:,1]*(success_fail.sum(axis=1)) > > > > > and compare that number to the observed number of success. I find the > predicted number to always be higher (I think, because of the intercept). > > > > > > > > > > > > I was not aware of a bias for imbalanced data. Can you tell me more? Why > does it not appear with the relaxed regularization? Also, using the same > data with statsmodels LR, which has no regularization, this doesn't seem to > be a problem. Any suggestions for > > how I could fix this are welcome. > > > > > > > > > > > > Thank you > > > > > > > > > > > > > > On Dec 15, 2016, at 4:41 PM, Stuart Reynolds <stu...@stuartreynolds.net> > wrote: > > > > > > > > LR is biased with imbalanced datasets. Is your dataset unbalanced? (e.g. > is there one class that has a much smaller prevalence in the data that the > other)? > > > > > > On Thu, Dec 15, 2016 at 1:02 PM, Rachel Melamed > > <mela...@uchicago.edu> wrote: > > > > > I just tried it and it did not appear to change the results at all? > > I ran it as follows: > > 1) Normalize dummy variables (by subtracting median) to make a matrix of > about 10000 x 5 > > > > > > > > 2) For each of the 1000 output variables: > > > a. Each output variable uses the same dummy variables, but not all > settings of covariates are observed for all output variables. So I create > the design matrix using patsy per output variable to include pairwise > interactions. Then, I have an around > > 10000 x 350 design matrix , and a matrix I call “success_fail” that has > for each setting the number of success and number of fail, so it is of size > 10000 x 2 > > > > > > > > b. Run regression using: > > > > > > > skdesign = np.vstack((design,design)) > > > > > sklabel = np.hstack((np.ones(success_fail.shape[0]), > > > np.zeros(success_fail.shape[0]))) > > > > > skweight = np.hstack((success_fail['success'], success_fail['fail'])) > > > > > > > > > > logregN = linear_model.LogisticRegression(C=1, > > > solver= 'lbfgs',fit_intercept=False) > > > logregN.fit(skdesign, sklabel, sample_weight=skweight) > > > > > > > > > > > > > > > > > > > > > > > On Dec 15, 2016, at 2:16 PM, Alexey Dral <aad...@gmail.com> wrote: > > > > > > > > Could you try to normalize dataset after feature dummy encoding and see if > it is reproducible behavior? > > > > > 2016-12-15 22:03 GMT+03:00 Rachel Melamed > > <mela...@uchicago.edu>: > > > > > Thanks for the reply. The covariates (“X") are all dummy/categorical > variables. So I guess no, nothing is normalized. > > > > > > > > > > > > > > > On Dec 15, 2016, at 1:54 PM, Alexey Dral <aad...@gmail.com> wrote: > > > > > > > > Hi Rachel, > > > > > > > Do you have your data normalized? > > > > > > 2016-12-15 20:21 GMT+03:00 Rachel Melamed > > <mela...@uchicago.edu>: > > > > > > > Hi all, > > > Does anyone have any suggestions for this problem: > > > > http://stackoverflow.com/questions/41125342/sklearn-logistic-regression-gives-biased-results > > > > > > > > > > > I am running around 1000 similar logistic regressions, with the same > covariates but slightly different data and response variables. All of my > response variables have a sparse successes (p(success) < .05 usually). > > > > > I noticed that with the regularized regression, the results are > consistently biased to predict more "successes" than is observed in the > training data. When I relax the regularization, this bias goes away. The > bias observed is unacceptable for my use case, but > > the more-regularized model does seem a bit better. > > > > > Below, I plot the results for the 1000 different regressions for 2 > different values of C: [image: results for the different regressions for > 2 different values of C] <https://i.stack.imgur.com/1cbrC.png> > > > > > I looked at the parameter estimates for one of these regressions: below > each point is one parameter. It seems like the intercept (the point on the > bottom left) is too high for the C=1 model. [image: enter image > description here] <https://i.stack.imgur.com/NTFOY.png> > > > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Yours sincerely, > > > Alexey A. Dral > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > Yours sincerely, > > > Alexey A. Dral > > > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > > > > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn