This is a very common calculation, you will find it at all of these places (but only with one standard deviation): http://scikit-learn.org/stable/auto_examples/randomized_search.html http://nbviewer.ipython.org/github/gmonce/scikit-learn-book/blob/master/Chapter%202%20-%20Supervised%20Learning%20-%20Image%20Recognition%20with%20Support%20Vector%20Machines.ipynb http://youtu.be/iFkRt3BCctg?t=33m25s
I would presume that standard deviation is multiplied by two because the author of the example wanted to create confidence intervals based on two standard deviations. Technically, if they multiplied it by 1.96, then they would approximate the famous 95% confidence interval better, but 2 standard deviations is often used for simplicity. http://en.wikipedia.org/wiki/1.96 Best, Jason -----Original Message----- From: scikit-learn-general-requ...@lists.sourceforge.net [mailto:scikit-learn-general-requ...@lists.sourceforge.net] Sent: Thursday, February 05, 2015 4:11 PM To: scikit-learn-general@lists.sourceforge.net Subject: Scikit-learn-general Digest, Vol 61, Issue 8 Send Scikit-learn-general mailing list submissions to scikit-learn-general@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/scikit-learn-general or, via email, send a message with subject or body 'help' to scikit-learn-general-requ...@lists.sourceforge.net You can reach the person managing the list at scikit-learn-general-ow...@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Scikit-learn-general digest..." Today's Topics: 1. Re: Calculating standard deviation for k-fold cross validation estimate (Michael Eickenberg) 2. Re: GSoC2015 topics (Joel Nothman) 3. Re: Calculating standard deviation for k-fold cross validation estimate (Joel Nothman) 4. Re: Calculating standard deviation for k-fold cross validation estimate (Kyle Kastner) ---------------------------------------------------------------------- Message: 1 Date: Thu, 5 Feb 2015 20:44:16 +0100 From: Michael Eickenberg <michael.eickenb...@gmail.com> Subject: Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross validation estimate To: "scikit-learn-general@lists.sourceforge.net" <scikit-learn-general@lists.sourceforge.net> Message-ID: <cadxjn660qzvxs+ui+cskezdzwskqh_9gagtl-opqwivha-g...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" this is most probably due to the fact that 2 = sqrt(5 - 1), a correction to variance reduction incurred by the overlapping nature of the folds. the bootstrap book contains more info on how to calculate these for different cases of splitting. hth, michael On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> wrote: > Hi, > > I am wondering why the standard deviation of the accuracy estimate is > multiplied by 2 in the example on > http://scikit-learn.org/stable/modules/cross_validation.html; it would be > nice if someone could explain it to me. > > The relevant excerpt from the page linked above: > > >>> clf = svm.SVC(kernel='linear', C=1) > >>> scores = cross_validation.cross_val_score( > ... clf, iris.data, iris.target, cv=5) > ... > >>> scores > array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) > The mean score and the standard deviation of the score estimate are hence > given by: > >>> > >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * > 2)) > Accuracy: 0.98 (+/- 0.03) > > > Best, > Sebastian > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net <javascript:;> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Fri, 6 Feb 2015 08:52:31 +1100 From: Joel Nothman <joel.noth...@gmail.com> Subject: Re: [Scikit-learn-general] GSoC2015 topics To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> Message-ID: <caakaflw8xun0yp_wgwn-x8wgwbtybqvny38-pqeej8e2hd-...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" > I think adding partial_fit functions in general to as many algorithms as possible would be nice Which could be a project in itself, for someone open to breadth rather than depth. On 6 February 2015 at 06:43, Kyle Kastner <kastnerk...@gmail.com> wrote: > IncrementalPCA is done (have to add randomized SVD solver but that should > be simple), but I am sure there are other low rank methods which need a > partial_fit . I think adding partial_fit functions in general to as many > algorithms as possible would be nice > > Kyle > > On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <akshaynukal...@gmail.com > > wrote: > >> Is Online low rank factorisation still a vaild idea for this year? As it >> was in the last years idea list. >> >> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort < >> alexandre.gramf...@telecom-paristech.fr> wrote: >> >>> > I just looked at the list from last year, and what seems most relevant >>> > still is GMMs, >>> > and possibly the coordinate descent solvers (Alex maybe you can say >>> what >>> > is left there or >>> > if with the SAG we are happy now?) >>> >>> there is work coming in coordinate descent and SAG is almost done. >>> I don't think it's worth investing a gsoc on this topic. >>> >>> Alex >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming. The Go Parallel Website, >>> sponsored by Intel and developed in partnership with Slashdot Media, is >>> your >>> hub for all things parallel software development, from weekly thought >>> leadership blogs to news, videos, case studies, tutorials and more. Take >>> a >>> look and join the conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 3 Date: Fri, 6 Feb 2015 08:54:12 +1100 From: Joel Nothman <joel.noth...@gmail.com> Subject: Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross validation estimate To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> Message-ID: <caakaflvu_-3krs31cunfbu3rd1sowtqun33rvqnyzvirrnl...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" With cv=5, only the training sets should overlap. Is this adjustment still appropriate? On 6 February 2015 at 06:44, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > this is most probably due to the fact that 2 = sqrt(5 - 1), a correction > to variance reduction incurred by the overlapping nature of the folds. the > bootstrap book contains more info on how to calculate these for different > cases of splitting. > > hth, > michael > > > On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > >> Hi, >> >> I am wondering why the standard deviation of the accuracy estimate is >> multiplied by 2 in the example on >> http://scikit-learn.org/stable/modules/cross_validation.html; it would >> be nice if someone could explain it to me. >> >> The relevant excerpt from the page linked above: >> >> >>> clf = svm.SVC(kernel='linear', C=1) >> >>> scores = cross_validation.cross_val_score( >> ... clf, iris.data, iris.target, cv=5) >> ... >> >>> scores >> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) >> The mean score and the standard deviation of the score estimate are hence >> given by: >> >>> >> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * >> 2)) >> Accuracy: 0.98 (+/- 0.03) >> >> >> Best, >> Sebastian >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 4 Date: Thu, 5 Feb 2015 17:11:00 -0500 From: Kyle Kastner <kastnerk...@gmail.com> Subject: Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross validation estimate To: scikit-learn-general@lists.sourceforge.net Message-ID: <CAGNZ19BYpHQS1zrKLAShgGEF=echmkw5erwwulxodm6pp57...@mail.gmail.com> Content-Type: text/plain; charset="utf-8" Could it also be accounting for +- ? Standard deviation is one sided right? On Thu, Feb 5, 2015 at 4:54 PM, Joel Nothman <joel.noth...@gmail.com> wrote: > With cv=5, only the training sets should overlap. Is this adjustment still > appropriate? > > On 6 February 2015 at 06:44, Michael Eickenberg < > michael.eickenb...@gmail.com> wrote: > >> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction >> to variance reduction incurred by the overlapping nature of the folds. the >> bootstrap book contains more info on how to calculate these for different >> cases of splitting. >> >> hth, >> michael >> >> >> On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am wondering why the standard deviation of the accuracy estimate is >>> multiplied by 2 in the example on >>> http://scikit-learn.org/stable/modules/cross_validation.html; it would >>> be nice if someone could explain it to me. >>> >>> The relevant excerpt from the page linked above: >>> >>> >>> clf = svm.SVC(kernel='linear', C=1) >>> >>> scores = cross_validation.cross_val_score( >>> ... clf, iris.data, iris.target, cv=5) >>> ... >>> >>> scores >>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) >>> The mean score and the standard deviation of the score estimate are >>> hence given by: >>> >>> >>> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * >>> 2)) >>> Accuracy: 0.98 (+/- 0.03) >>> >>> >>> Best, >>> Sebastian >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming. The Go Parallel Website, >>> sponsored by Intel and developed in partnership with Slashdot Media, is >>> your >>> hub for all things parallel software development, from weekly thought >>> leadership blogs to news, videos, case studies, tutorials and more. Take >>> a >>> look and join the conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is > your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ ------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general End of Scikit-learn-general Digest, Vol 61, Issue 8 *************************************************** ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general