Thanks for all your answers! Jason, I think you could be right, but the author wrote in the line above the code
"The mean score and the standard deviation of the score estimate are hence given by:" So I assume he literally meant standard deviation to show how the scores varies rather than showing how confident the mean score is. Michael's suggestion makes most sense to me right now, but I have to dig deeper into the literature here... >>> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction >>> to variance reduction incurred by the overlapping nature of the folds. the >>> bootstrap book contains more info on how to calculate these for different >>> cases of splitting. >>> >>> hth, >>> michael Although we have to be a little bit careful with the "overlaps" here since it can be confused with "with replacement" like in boosting. So basically. here only the folds overlap across the different iterations, but the "sqrt(5 - 1)" makes sense. Thanks for all your help! Best, Sebastian > On Feb 5, 2015, at 11:32 PM, Jason Sanchez <jason.sanchez.m...@statefarm.com> > wrote: > > This is a very common calculation, you will find it at all of these places > (but only with one standard deviation): > http://scikit-learn.org/stable/auto_examples/randomized_search.html > http://nbviewer.ipython.org/github/gmonce/scikit-learn-book/blob/master/Chapter%202%20-%20Supervised%20Learning%20-%20Image%20Recognition%20with%20Support%20Vector%20Machines.ipynb > http://youtu.be/iFkRt3BCctg?t=33m25s > > I would presume that standard deviation is multiplied by two because the > author of the example wanted to create confidence intervals based on two > standard deviations. Technically, if they multiplied it by 1.96, then they > would approximate the famous 95% confidence interval better, but 2 standard > deviations is often used for simplicity. > > http://en.wikipedia.org/wiki/1.96 > > Best, > Jason > > > > -----Original Message----- > From: scikit-learn-general-requ...@lists.sourceforge.net > [mailto:scikit-learn-general-requ...@lists.sourceforge.net] > Sent: Thursday, February 05, 2015 4:11 PM > To: scikit-learn-general@lists.sourceforge.net > Subject: Scikit-learn-general Digest, Vol 61, Issue 8 > > Send Scikit-learn-general mailing list submissions to > scikit-learn-general@lists.sourceforge.net > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > or, via email, send a message with subject or body 'help' to > scikit-learn-general-requ...@lists.sourceforge.net > > You can reach the person managing the list at > scikit-learn-general-ow...@lists.sourceforge.net > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Scikit-learn-general digest..." > > > Today's Topics: > > 1. Re: Calculating standard deviation for k-fold cross > validation estimate (Michael Eickenberg) > 2. Re: GSoC2015 topics (Joel Nothman) > 3. Re: Calculating standard deviation for k-fold cross > validation estimate (Joel Nothman) > 4. Re: Calculating standard deviation for k-fold cross > validation estimate (Kyle Kastner) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 5 Feb 2015 20:44:16 +0100 > From: Michael Eickenberg <michael.eickenb...@gmail.com> > Subject: Re: [Scikit-learn-general] Calculating standard deviation for > k-fold cross validation estimate > To: "scikit-learn-general@lists.sourceforge.net" > <scikit-learn-general@lists.sourceforge.net> > Message-ID: > <cadxjn660qzvxs+ui+cskezdzwskqh_9gagtl-opqwivha-g...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > this is most probably due to the fact that 2 = sqrt(5 - 1), a correction to > variance reduction incurred by the overlapping nature of the folds. the > bootstrap book contains more info on how to calculate these for different > cases of splitting. > > hth, > michael > > On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> > wrote: > >> Hi, >> >> I am wondering why the standard deviation of the accuracy estimate is >> multiplied by 2 in the example on >> http://scikit-learn.org/stable/modules/cross_validation.html; it would be >> nice if someone could explain it to me. >> >> The relevant excerpt from the page linked above: >> >>>>> clf = svm.SVC(kernel='linear', C=1) >>>>> scores = cross_validation.cross_val_score( >> ... clf, iris.data, iris.target, cv=5) >> ... >>>>> scores >> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) >> The mean score and the standard deviation of the score estimate are hence >> given by: >>>>> >>>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * >> 2)) >> Accuracy: 0.98 (+/- 0.03) >> >> >> Best, >> Sebastian >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net <javascript:;> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 2 > Date: Fri, 6 Feb 2015 08:52:31 +1100 > From: Joel Nothman <joel.noth...@gmail.com> > Subject: Re: [Scikit-learn-general] GSoC2015 topics > To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> > Message-ID: > <caakaflw8xun0yp_wgwn-x8wgwbtybqvny38-pqeej8e2hd-...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > >> I think adding partial_fit functions in general to as many algorithms as > possible would be nice > > Which could be a project in itself, for someone open to breadth rather than > depth. > > On 6 February 2015 at 06:43, Kyle Kastner <kastnerk...@gmail.com> wrote: > >> IncrementalPCA is done (have to add randomized SVD solver but that should >> be simple), but I am sure there are other low rank methods which need a >> partial_fit . I think adding partial_fit functions in general to as many >> algorithms as possible would be nice >> >> Kyle >> >> On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <akshaynukal...@gmail.com >>> wrote: >> >>> Is Online low rank factorisation still a vaild idea for this year? As it >>> was in the last years idea list. >>> >>> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort < >>> alexandre.gramf...@telecom-paristech.fr> wrote: >>> >>>>> I just looked at the list from last year, and what seems most relevant >>>>> still is GMMs, >>>>> and possibly the coordinate descent solvers (Alex maybe you can say >>>> what >>>>> is left there or >>>>> if with the SAG we are happy now?) >>>> >>>> there is work coming in coordinate descent and SAG is almost done. >>>> I don't think it's worth investing a gsoc on this topic. >>>> >>>> Alex >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Dive into the World of Parallel Programming. The Go Parallel Website, >>>> sponsored by Intel and developed in partnership with Slashdot Media, is >>>> your >>>> hub for all things parallel software development, from weekly thought >>>> leadership blogs to news, videos, case studies, tutorials and more. Take >>>> a >>>> look and join the conversation now. http://goparallel.sourceforge.net/ >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming. The Go Parallel Website, >>> sponsored by Intel and developed in partnership with Slashdot Media, is >>> your >>> hub for all things parallel software development, from weekly thought >>> leadership blogs to news, videos, case studies, tutorials and more. Take a >>> look and join the conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 3 > Date: Fri, 6 Feb 2015 08:54:12 +1100 > From: Joel Nothman <joel.noth...@gmail.com> > Subject: Re: [Scikit-learn-general] Calculating standard deviation for > k-fold cross validation estimate > To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net> > Message-ID: > <caakaflvu_-3krs31cunfbu3rd1sowtqun33rvqnyzvirrnl...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > With cv=5, only the training sets should overlap. Is this adjustment still > appropriate? > > On 6 February 2015 at 06:44, Michael Eickenberg < > michael.eickenb...@gmail.com> wrote: > >> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction >> to variance reduction incurred by the overlapping nature of the folds. the >> bootstrap book contains more info on how to calculate these for different >> cases of splitting. >> >> hth, >> michael >> >> >> On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I am wondering why the standard deviation of the accuracy estimate is >>> multiplied by 2 in the example on >>> http://scikit-learn.org/stable/modules/cross_validation.html; it would >>> be nice if someone could explain it to me. >>> >>> The relevant excerpt from the page linked above: >>> >>>>>> clf = svm.SVC(kernel='linear', C=1) >>>>>> scores = cross_validation.cross_val_score( >>> ... clf, iris.data, iris.target, cv=5) >>> ... >>>>>> scores >>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) >>> The mean score and the standard deviation of the score estimate are hence >>> given by: >>>>>> >>>>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * >>> 2)) >>> Accuracy: 0.98 (+/- 0.03) >>> >>> >>> Best, >>> Sebastian >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming. The Go Parallel Website, >>> sponsored by Intel and developed in partnership with Slashdot Media, is >>> your >>> hub for all things parallel software development, from weekly thought >>> leadership blogs to news, videos, case studies, tutorials and more. Take a >>> look and join the conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > Message: 4 > Date: Thu, 5 Feb 2015 17:11:00 -0500 > From: Kyle Kastner <kastnerk...@gmail.com> > Subject: Re: [Scikit-learn-general] Calculating standard deviation for > k-fold cross validation estimate > To: scikit-learn-general@lists.sourceforge.net > Message-ID: > <CAGNZ19BYpHQS1zrKLAShgGEF=echmkw5erwwulxodm6pp57...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Could it also be accounting for +- ? Standard deviation is one sided right? > > On Thu, Feb 5, 2015 at 4:54 PM, Joel Nothman <joel.noth...@gmail.com> wrote: > >> With cv=5, only the training sets should overlap. Is this adjustment still >> appropriate? >> >> On 6 February 2015 at 06:44, Michael Eickenberg < >> michael.eickenb...@gmail.com> wrote: >> >>> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction >>> to variance reduction incurred by the overlapping nature of the folds. the >>> bootstrap book contains more info on how to calculate these for different >>> cases of splitting. >>> >>> hth, >>> michael >>> >>> >>> On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am wondering why the standard deviation of the accuracy estimate is >>>> multiplied by 2 in the example on >>>> http://scikit-learn.org/stable/modules/cross_validation.html; it would >>>> be nice if someone could explain it to me. >>>> >>>> The relevant excerpt from the page linked above: >>>> >>>>>>> clf = svm.SVC(kernel='linear', C=1) >>>>>>> scores = cross_validation.cross_val_score( >>>> ... clf, iris.data, iris.target, cv=5) >>>> ... >>>>>>> scores >>>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ]) >>>> The mean score and the standard deviation of the score estimate are >>>> hence given by: >>>>>>> >>>>>>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * >>>> 2)) >>>> Accuracy: 0.98 (+/- 0.03) >>>> >>>> >>>> Best, >>>> Sebastian >>>> >>>> ------------------------------------------------------------------------------ >>>> Dive into the World of Parallel Programming. The Go Parallel Website, >>>> sponsored by Intel and developed in partnership with Slashdot Media, is >>>> your >>>> hub for all things parallel software development, from weekly thought >>>> leadership blogs to news, videos, case studies, tutorials and more. Take >>>> a >>>> look and join the conversation now. http://goparallel.sourceforge.net/ >>>> _______________________________________________ >>>> Scikit-learn-general mailing list >>>> Scikit-learn-general@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming. The Go Parallel Website, >>> sponsored by Intel and developed in partnership with Slashdot Media, is >>> your >>> hub for all things parallel software development, from weekly thought >>> leadership blogs to news, videos, case studies, tutorials and more. Take a >>> look and join the conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming. The Go Parallel Website, >> sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> hub for all things parallel software development, from weekly thought >> leadership blogs to news, videos, case studies, tutorials and more. Take a >> look and join the conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > -------------- next part -------------- > An HTML attachment was scrubbed... > > ------------------------------ > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > > ------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > End of Scikit-learn-general Digest, Vol 61, Issue 8 > *************************************************** > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming. The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general