Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread josef . pktd
just some generic comments, I don't have any experience with penalized estimation nor did I go through the math. In unregularized Logistis Regression or Logit and in several other models the estimator satisfies some aggregation properties so that in sample or training set proportions match between

Re: [scikit-learn] biased predictions in logistic regression

2016-12-18 Thread josef . pktd
On Sat, Dec 17, 2016 at 10:25 PM, Rachel Melamed wrote: > Hi Sean, Sebastian, Alexey (and Josef), > I’m not sure I fully understand what normalizing a dummy should consist > of, so please let me know if I am interpreting your suggestion right. I > believe I can’t use the StandardScaler since I

Re: [scikit-learn] Ordinary Least Square Regression Under-determined system.

2017-04-25 Thread josef . pktd
scipy.linalg.leastq uses an SVD solver and drops singular components, where singular depends on the condition number threshold. So it's equivalent to PCR with a tiny threshold for dropping components (rcond < 1e-15, if it's similar to numpy). SVD/rcond is on original, not on standardized variables

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-04 Thread josef . pktd
On Wed, Oct 4, 2017 at 4:26 PM, Stuart Reynolds wrote: > Hi Andy, > Thanks -- I'll give another statsmodels another go. > I remember I had some fitting speed issues with it in the past, and > also some issues related their models keeping references to the data > (=disaster for serialization and m

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread josef . pktd
On Thu, Oct 5, 2017 at 12:34 PM, Stuart Reynolds wrote: > Thanks Josef. Was very useful. > > result.remove_data() reduces a 5 parameter Logit result object from > megabytes to 5Kb (as compared to a minimum uncompressed size of the > parameters of ~320 bytes). Is big improvement. I'll experiment w

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread josef . pktd
On Thu, Oct 5, 2017 at 3:00 PM, Stuart Reynolds wrote: > Hi Sean, > > I'll have a look glmnet (looks like its compiled from fortran!). Does > it offer much over statsmodel's GLM? This looks great for researchy > stuff, although a little less performant. > GLMNet is/wraps the original Fortran imp

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-05 Thread josef . pktd
On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds wrote: > Turns out sm.Logit does allow setting the tolerance. > Some and quick and dirty time profiling of different methods on a 100k > * 30 features dataset, with different solvers and losses: > > sklearn.LogisticRegression: l1 1.13864398003 (seco

Re: [scikit-learn] Can fit a model with a target array of probabilities?

2017-10-06 Thread josef . pktd
On Thu, Oct 5, 2017 at 3:27 PM, wrote: > > > On Thu, Oct 5, 2017 at 2:52 PM, Stuart Reynolds > wrote: > >> Turns out sm.Logit does allow setting the tolerance. >> Some and quick and dirty time profiling of different methods on a 100k >> * 30 features dataset, with different solvers and losses: >

Re: [scikit-learn] NEP: Random Number Generator Policy

2018-06-16 Thread josef . pktd
On Sat, Jun 16, 2018 at 3:59 AM, Robert Kern wrote: > I have made a significant revision. In this version, downstream projects > like scikit-learn should experience significantly less forced churn. > > https://github.com/rkern/numpy/blob/nep/rng-clarification/doc/neps/nep-0019-rng-policy.rst > > h

Re: [scikit-learn] NEP: Random Number Generator Policy

2018-06-16 Thread josef . pktd
On Sat, Jun 16, 2018 at 8:29 PM, Robert Kern wrote: > On 6/16/18 05:54, josef.p...@gmail.com wrote: >> >> On Sat, Jun 16, 2018 at 3:59 AM, Robert Kern >> wrote: >>> >>> I have made a significant revision. In this version, downstream projects >>> like scikit-learn should experience significantly l

Re: [scikit-learn] AUCROC/MAP confidence intervals in scikit

2019-02-07 Thread josef . pktd
Just a skeptical comment from a bystander. I only skimmed parts of the article. My impression is that this does not apply (directly) to the regression setting. AFAIU, they assume that all observations have the same propability. To me it looks more like the literature on testing of or confidence i

Re: [scikit-learn] Why ridge regression can solve multicollinearity?

2020-01-08 Thread josef . pktd
On Wed, Jan 8, 2020 at 9:38 PM lampahome wrote: > > > Stuart Reynolds 於 2020年1月9日 週四 上午10:33寫道: > >> Correlated features typically have the property that they are tending to >> be similarly predictive of the outcome. >> >> L1 and L2 are both a preference for low coefficients. >> If a coefficient

Re: [scikit-learn] Why ridge regression can solve multicollinearity?

2020-01-08 Thread josef . pktd
On Wed, Jan 8, 2020 at 9:43 PM wrote: > > > On Wed, Jan 8, 2020 at 9:38 PM lampahome wrote: > >> >> >> Stuart Reynolds 於 2020年1月9日 週四 上午10:33寫道: >> >>> Correlated features typically have the property that they are tending to >>> be similarly predictive of the outcome. >>> >>> L1 and L2 are both