Dear Scikit-learn, This is my first message in this community!
I make it because I think "model complexity" and "model prediction" are two separate "properties", which cannot in principle be directly compared. This is because one variable is missing, which is the data. If the initial data set corresponds to the entire true range of possible data, then I would say complex models will "model" the variable being studied with a prediction accuracy equal or better than any other "less complex" model. If the data set is not representative, then you might overfit with more complex models and there is a chance that more simple models will predict better for unseen sets of data. Therefore, the quality of the data is critical to judge how good will your model be. Hope this helps. João João André Civil Engineer, M.Sc., Ph.D. Structures Department National Laboratory for Civil Engineering LNEC, Av. Brasil 101, 1700-066 Lisbon, Portugal Web: http://www.lnec.pt/ Skype ID: jpcgandre Phone: (+351) 218 443 355 On Wed, 16 Oct 2019 at 15:05, Gael Varoquaux <gael.varoqu...@normalesup.org> wrote: > On Sun, Oct 13, 2019 at 07:40:11PM +0900, Brown J.B. via scikit-learn > wrote: > > Please, respect and refinement when addressing the contributors and > users of > > scikit-learn. > > I believe that Mike simply misread. It's something that happens (it > happens a lot to me). > > No harm on my side, and thanks for clarifying my overly short reply. > > G > > > Gael's statement is perfect -- complexity does not imply better > prediction. > > The choice of estimator (and algorithm) depends on the structure of the > model > > desired for the data presented. > > Estimator superiority cannot be proven in a context- and/or data-agnostic > > fashion. > > > J.B. > > > > 2019年10月13日(日) 6:13 Mike Smith <javaeur...@gmail.com>: > > > "Second complexity does not > > > imply better prediction. " > > > Complexity doesn't imply prediction? Perhaps you're having a > translation > > error. > > > On Sat, Oct 12, 2019 at 2:04 PM <scikit-learn-requ...@python.org> > wrote: > > > Send scikit-learn mailing list submissions to > > scikit-learn@python.org > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/scikit-learn > > or, via email, send a message with subject or body 'help' to > > scikit-learn-requ...@python.org > > > You can reach the person managing the list at > > scikit-learn-ow...@python.org > > > When replying, please edit your Subject line so it is more > specific > > than "Re: Contents of scikit-learn digest..." > > > > Today's Topics: > > > 1. Re: scikit-learn Digest, Vol 43, Issue 24 (Mike Smith) > > > > > ---------------------------------------------------------------------- > > > Message: 1 > > Date: Sat, 12 Oct 2019 14:04:12 -0700 > > From: Mike Smith <javaeur...@gmail.com> > > To: scikit-learn@python.org > > Subject: Re: [scikit-learn] scikit-learn Digest, Vol 43, Issue 24 > > Message-ID: > > <CAEWZffD-hNviFkyxuM8CgDR3XSWOyn= > > 4lry2njvjwvvr4rg...@mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > "... > If I should expect good results on a pc, scikit says that > > needing > > gpu power is > > > obsolete, since certain scikit models perform better (than ml > > designed > > for gpu) > > > that are not designed for gpu, for that reason. Is this true?" > > > Where do you see this written? I think that you are looking for > overly > > simple stories that you are not true." > > > Gael, see the below from the scikit-learn FAQ. You can also find > this > > yourself at the main FAQ: > > > [image: 2019-10-12 14_00_05-Frequently Asked Questions ? > scikit-learn > > 0.21.3 documentation.png] > > > > On Sat, Oct 12, 2019 at 9:03 AM <scikit-learn-requ...@python.org > > > > wrote: > > > > Send scikit-learn mailing list submissions to > > > scikit-learn@python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > or, via email, send a message with subject or body 'help' to > > > scikit-learn-requ...@python.org > > > > You can reach the person managing the list at > > > scikit-learn-ow...@python.org > > > > When replying, please edit your Subject line so it is more > specific > > > than "Re: Contents of scikit-learn digest..." > > > > > Today's Topics: > > > > 1. Re: Is scikit-learn implying neural nets are the best > > > regressor? (Gael Varoquaux) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > > Date: Fri, 11 Oct 2019 13:34:33 -0400 > > > From: Gael Varoquaux <gael.varoqu...@normalesup.org> > > > To: Scikit-learn mailing list <scikit-learn@python.org> > > > Subject: Re: [scikit-learn] Is scikit-learn implying neural > nets are > > > the best regressor? > > > Message-ID: < > 20191011173433.bbywiqnwjjpvs...@phare.normalesup.org> > > > Content-Type: text/plain; charset=iso-8859-1 > > > > On Fri, Oct 11, 2019 at 10:10:32AM -0700, Mike Smith wrote: > > > > In other words, according to that arrangement, is > scikit-learn > > implying > > > that > > > > section 1.17 is the best regressor out of the listed, 1.1 to > 1.17? > > > > No. > > > > First they are not ordered in order of complexity (Naive Bayes > is > > > arguably simpler than Gaussian Processes). Second complexity > does not > > > imply better prediction. > > > > > If I should expect good results on a pc, scikit says that > needing > > gpu > > > power is > > > > obsolete, since certain scikit models perform better (than ml > > designed > > > for gpu) > > > > that are not designed for gpu, for that reason. Is this true? > > > > Where do you see this written? I think that you are looking for > > overly > > > simple stories that you are not true. > > > > > How much hardware is a practical expectation for running the > best > > > > scikit models and getting the best results? > > > > This is too vague a question for which there is no answer. > > > > Ga?l > > > > > On Fri, Oct 11, 2019 at 9:02 AM < > scikit-learn-requ...@python.org> > > wrote: > > > > > Send scikit-learn mailing list submissions to > > > > ? ? ? ? scikit-learn@python.org > > > > > To subscribe or unsubscribe via the World Wide Web, visit > > > > ? ? ? ? > https://mail.python.org/mailman/listinfo/scikit-learn > > > > or, via email, send a message with subject or body > 'help' to > > > > ? ? ? ? scikit-learn-requ...@python.org > > > > > You can reach the person managing the list at > > > > ? ? ? ? scikit-learn-ow...@python.org > > > > > When replying, please edit your Subject line so it is > more > > specific > > > > than "Re: Contents of scikit-learn digest..." > > > > > > Today's Topics: > > > > > ? ?1. Re: logistic regression results are not stable > between > > > > ? ? ? solvers (Andreas Mueller) > > > > > > > > > ---------------------------------------------------------------------- > > > > > Message: 1 > > > > Date: Fri, 11 Oct 2019 15:42:58 +0200 > > > > From: Andreas Mueller <t3k...@gmail.com> > > > > To: scikit-learn@python.org > > > > Subject: Re: [scikit-learn] logistic regression results > are not > > > stable > > > > ? ? ? ? between solvers > > > > Message-ID: < > d55949d6-3355-f892-f6b3-030edf1c7...@gmail.com> > > > > Content-Type: text/plain; charset="utf-8"; > Format="flowed" > > > > > > > On 10/10/19 1:14 PM, Beno?t Presles wrote: > > > > > > Thanks for your answers. > > > > > > On my real data, I do not have so many samples. I have > a bit > > more > > > than > > > > > 200 samples in total and I also would like to get some > > results with > > > > > unpenalized logisitic regression. > > > > > What do you suggest? Should I switch to the lbfgs > solver? > > > > Yes. > > > > > Am I sure that with this solver I will not have any > > convergence > > > issue > > > > > and always get the good result? Indeed, I did not get > any > > > convergence > > > > > warning with saga, so I thought everything was fine. I > > noticed some > > > > > issues only when I decided to test several solvers. > Without > > > comparing > > > > > the results across solvers, how to be sure that the > > optimisation > > > goes > > > > > well? Shouldn't scikit-learn warn the user somehow if > it is > > not > > > the case? > > > > We should attempt to warn in the SAGA solver if it > doesn't > > converge. > > > > That it doesn't raise a convergence warning should > probably be > > > > considered a bug. > > > > It uses the maximum weight change as a stopping > criterion right > > now. > > > > We could probably compute the dual objective once in the > end to > > see > > > if > > > > we converged, right? Or is that not possible with SAGA? > If not, > > we > > > might > > > > want to caution that no convergence warning will be > raised. > > > > > > > At last, I was using saga because I also wanted to do > some > > feature > > > > > selection by using l1 penalty which is not supported by > > lbfgs... > > > > You can use liblinear then. > > > > > > > > Best regards, > > > > > Ben > > > > > > > Le 09/10/2019 ? 23:39, Guillaume Lema?tre a ?crit?: > > > > >> Ups I did not see the answer of Roman. Sorry about > that. It > > is > > > coming > > > > >> back to the same conclusion :) > > > > > >> On Wed, 9 Oct 2019 at 23:37, Guillaume Lema?tre > > > > >> <g.lemaitr...@gmail.com <mailto: > g.lemaitr...@gmail.com>> > > wrote: > > > > > >>? ? ?Uhm actually increasing to 10000 samples solve the > > convergence > > > > issue. > > > > >>? ? ?SAGA is not designed to work with a so small > sample size > > most > > > > >>? ? ?probably. > > > > > >>? ? ?On Wed, 9 Oct 2019 at 23:36, Guillaume Lema?tre > > > > >>? ? ?<g.lemaitr...@gmail.com <mailto: > g.lemaitr...@gmail.com>> > > > wrote: > > > > > >>? ? ? ? ?I slightly change the bench such that it uses > > pipeline and > > > > >>? ? ? ? ?plotted the coefficient: > > > > > >>? ? ? ? ?https://gist.github.com/glemaitre/ > > > > 8fcc24bdfc7dc38ca0c09c56e26b9386 > > > > > >>? ? ? ? ?I only see one of the 10 splits where SAGA is > not > > > converging, > > > > >>? ? ? ? ?otherwise the coefficients > > > > >>? ? ? ? ?look very close (I don't attach the figure > here but > > they > > > can > > > > >>? ? ? ? ?be plotted using the snippet). > > > > >>? ? ? ? ?So apart from this second split, the other > > differences > > > seems > > > > >>? ? ? ? ?to be numerical instability. > > > > > >>? ? ? ? ?Where I have some concern is regarding the > > convergence > > > rate > > > > >>? ? ? ? ?of SAGA but I have no > > > > >>? ? ? ? ?intuition to know if this is normal or not. > > > > > >>? ? ? ? ?On Wed, 9 Oct 2019 at 23:22, Roman Yurchak > > > > >>? ? ? ? ?<rth.yurc...@gmail.com <mailto: > rth.yurc...@gmail.com > > > > wrote: > > > > > >>? ? ? ? ? ? ?Ben, > > > > > >>? ? ? ? ? ? ?I can confirm your results with > penalty='none' > > and > > > C=1e9. > > > > >>? ? ? ? ? ? ?In both cases, > > > > >>? ? ? ? ? ? ?you are running a mostly unpenalized > logisitic > > > > >>? ? ? ? ? ? ?regression. Usually > > > > >>? ? ? ? ? ? ?that's less numerically stable than with > a small > > > > >>? ? ? ? ? ? ?regularization, > > > > >>? ? ? ? ? ? ?depending on the data collinearity. > > > > > >>? ? ? ? ? ? ?Running that same code with > > > > >>? ? ? ? ? ? ?? - larger penalty ( smaller C values) > > > > >>? ? ? ? ? ? ?? - or larger number of samples > > > > >>? ? ? ? ? ? ?? yields for me the same coefficients (up > to > > some > > > > tolerance). > > > > > >>? ? ? ? ? ? ?You can also see that SAGA convergence is > not > > good by > > > the > > > > >>? ? ? ? ? ? ?fact that it > > > > >>? ? ? ? ? ? ?needs 196000 epochs/iterations to > converge. > > > > > >>? ? ? ? ? ? ?Actually, I have often seen convergence > issues > > with > > > SAG > > > > >>? ? ? ? ? ? ?on small > > > > >>? ? ? ? ? ? ?datasets (in unit tests), not fully sure > why. > > > > > >>? ? ? ? ? ? ?-- > > > > >>? ? ? ? ? ? ?Roman > > > > > >>? ? ? ? ? ? ?On 09/10/2019 22:10, serafim loukas wrote: > > > > >>? ? ? ? ? ? ?> The predictions across solver are > exactly the > > same > > > when > > > > >>? ? ? ? ? ? ?I run the code. > > > > >>? ? ? ? ? ? ?> I am using 0.21.3 version. What is > yours? > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> In [13]: import sklearn > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> In [14]: sklearn.__version__ > > > > >>? ? ? ? ? ? ?> Out[14]: '0.21.3' > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> Serafeim > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?>> On 9 Oct 2019, at 21:44, Beno?t Presles > > > > >>? ? ? ? ? ? ?<benoit.pres...@u-bourgogne.fr > > > > >>? ? ? ? ? ? ?<mailto:benoit.pres...@u-bourgogne.fr> > > > > >>? ? ? ? ? ? ?>> <mailto:benoit.pres...@u-bourgogne.fr > > > > >>? ? ? ? ? ? ?<mailto:benoit.pres...@u-bourgogne.fr>>> > wrote: > > > > >>? ? ? ? ? ? ?>> > > > > >>? ? ? ? ? ? ?>> (y_pred_lbfgs==y_pred_saga).all() == > False > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > > > >>? ? ? ? ? ? ?> > > _______________________________________________ > > > > >>? ? ? ? ? ? ?> scikit-learn mailing list > > > > >>? ? ? ? ? ? ?> scikit-learn@python.org <mailto: > > > scikit-learn@python.org> > > > > >>? ? ? ? ? ? ?> > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > >>? ? ? ? ? ? ?> > > > > > >>? ? ? ? ? ? > ?_______________________________________________ > > > > >>? ? ? ? ? ? ?scikit-learn mailing list > > > > >>? ? ? ? ? ? ?scikit-learn@python.org <mailto: > > > scikit-learn@python.org> > > > > >>? ? ? ? ? ? ?https://mail.python.org/mailman/listinfo/ > > scikit-learn > > > > > > > >>? ? ? ? ?-- > > > > >>? ? ? ? ?Guillaume Lemaitre > > > > >>? ? ? ? ?Scikit-learn @ Inria Foundation > > > > >>? ? ? ? ?https://glemaitre.github.io/ > > > > > > > >>? ? ?-- > > > > >>? ? ?Guillaume Lemaitre > > > > >>? ? ?Scikit-learn @ Inria Foundation > > > > >>? ? ?https://glemaitre.github.io/ > > > > > > > >> -- > > > > >> Guillaume Lemaitre > > > > >> Scikit-learn @ Inria Foundation > > > > >> https://glemaitre.github.io/ > > > > > >> _______________________________________________ > > > > >> scikit-learn mailing list > > > > >> scikit-learn@python.org > > > > >> https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > _______________________________________________ > > > > > scikit-learn mailing list > > > > > scikit-learn@python.org > > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -------------- next part -------------- > > > > An HTML attachment was scrubbed... > > > > URL: < > > > > http://mail.python.org/pipermail/scikit-learn/attachments/20191011/ > > > > a7052cd9/attachment-0001.html> > > > > > ------------------------------ > > > > > Subject: Digest Footer > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn@python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > > ------------------------------ > > > > > End of scikit-learn Digest, Vol 43, Issue 21 > > > > ******************************************** > > > > > > _______________________________________________ > > > > scikit-learn mailing list > > > > scikit-learn@python.org > > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > -- > > > Gael Varoquaux > > > Research Director, INRIA Visiting professor, > McGill > > > http://gael-varoquaux.info http://twitter.com/ > > GaelVaroquaux > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > > scikit-learn mailing list > > > scikit-learn@python.org > > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > > ------------------------------ > > > > End of scikit-learn Digest, Vol 43, Issue 24 > > > ******************************************** > > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: <http://mail.python.org/pipermail/scikit-learn/attachments/ > > 20191012/6959d075/attachment.html> > > -------------- next part -------------- > > A non-text attachment was scrubbed... > > Name: 2019-10-12 14_00_05-Frequently Asked Questions ? > scikit-learn > > 0.21.3 documentation.png > > Type: image/png > > Size: 26245 bytes > > Desc: not available > > URL: <http://mail.python.org/pipermail/scikit-learn/attachments/ > > 20191012/6959d075/attachment.png> > > > ------------------------------ > > > Subject: Digest Footer > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > ------------------------------ > > > End of scikit-learn Digest, Vol 43, Issue 25 > > ******************************************** > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > > scikit-learn mailing list > > scikit-learn@python.org > > https://mail.python.org/mailman/listinfo/scikit-learn > > > -- > Gael Varoquaux > Research Director, INRIA Visiting professor, McGill > http://gael-varoquaux.info http://twitter.com/GaelVaroquaux > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn