Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross

Jason Sanchez Thu, 05 Feb 2015 20:51:07 -0800

This is a very common calculation, you will find it at all of these places (but 
only with one standard deviation):
http://scikit-learn.org/stable/auto_examples/randomized_search.html
http://nbviewer.ipython.org/github/gmonce/scikit-learn-book/blob/master/Chapter%202%20-%20Supervised%20Learning%20-%20Image%20Recognition%20with%20Support%20Vector%20Machines.ipynb
http://youtu.be/iFkRt3BCctg?t=33m25s


I would presume that standard deviation is multiplied by two because the author 
of the example wanted to create confidence intervals based on two standard 
deviations. Technically, if they multiplied it by 1.96, then they would 
approximate the famous 95% confidence interval better, but 2 standard 
deviations is often used for simplicity.

http://en.wikipedia.org/wiki/1.96

Best,
Jason



-----Original Message-----
From: scikit-learn-general-requ...@lists.sourceforge.net 
[mailto:scikit-learn-general-requ...@lists.sourceforge.net] 
Sent: Thursday, February 05, 2015 4:11 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Scikit-learn-general Digest, Vol 61, Issue 8

Send Scikit-learn-general mailing list submissions to
        scikit-learn-general@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
        scikit-learn-general-requ...@lists.sourceforge.net

You can reach the person managing the list at
        scikit-learn-general-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."


Today's Topics:

   1. Re: Calculating standard deviation for k-fold cross
      validation estimate (Michael Eickenberg)
   2. Re: GSoC2015 topics (Joel Nothman)
   3. Re: Calculating standard deviation for k-fold cross
      validation estimate (Joel Nothman)
   4. Re: Calculating standard deviation for k-fold cross
      validation estimate (Kyle Kastner)


----------------------------------------------------------------------

Message: 1
Date: Thu, 5 Feb 2015 20:44:16 +0100
From: Michael Eickenberg <michael.eickenb...@gmail.com>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: "scikit-learn-general@lists.sourceforge.net"
        <scikit-learn-general@lists.sourceforge.net>
Message-ID:
        <cadxjn660qzvxs+ui+cskezdzwskqh_9gagtl-opqwivha-g...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

this is most probably due to the fact that 2 = sqrt(5 - 1), a correction to
variance reduction incurred by the overlapping nature of the folds. the
bootstrap book contains more info on how to calculate these for different
cases of splitting.

hth,
michael

On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> Hi,
>
> I am wondering why the standard deviation of the accuracy estimate is
> multiplied by 2 in the example on
> http://scikit-learn.org/stable/modules/cross_validation.html; it would be
> nice if someone could explain it to me.
>
> The relevant excerpt from the page linked above:
>
> >>> clf = svm.SVC(kernel='linear', C=1)
> >>> scores = cross_validation.cross_val_score(
> ... clf, iris.data, iris.target, cv=5)
> ...
> >>> scores
> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
> The mean score and the standard deviation of the score estimate are hence
> given by:
> >>>
> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
> 2))
> Accuracy: 0.98 (+/- 0.03)
>
>
> Best,
> Sebastian
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net <javascript:;>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Fri, 6 Feb 2015 08:52:31 +1100
From: Joel Nothman <joel.noth...@gmail.com>
Subject: Re: [Scikit-learn-general] GSoC2015 topics
To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
Message-ID:
        <caakaflw8xun0yp_wgwn-x8wgwbtybqvny38-pqeej8e2hd-...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

> I think adding partial_fit functions in general to as many algorithms as
possible would be nice

Which could be a project in itself, for someone open to breadth rather than
depth.

On 6 February 2015 at 06:43, Kyle Kastner <kastnerk...@gmail.com> wrote:

> IncrementalPCA is done (have to add randomized SVD solver but that should
> be simple), but I am sure there are other low rank methods which need a
> partial_fit . I think adding partial_fit functions in general to as many
> algorithms as possible would be nice
>
> Kyle
>
> On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <akshaynukal...@gmail.com
> > wrote:
>
>> Is Online low rank factorisation still a vaild idea for this year? As it
>> was in the last years idea list.
>>
>> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort <
>> alexandre.gramf...@telecom-paristech.fr> wrote:
>>
>>> > I just looked at the list from last year, and what seems most relevant
>>> > still is GMMs,
>>> > and possibly the coordinate descent solvers (Alex maybe you can say
>>> what
>>> > is left there or
>>> > if with the SAG we are happy now?)
>>>
>>> there is work coming in coordinate descent and SAG is almost done.
>>> I don't think it's worth investing a gsoc on this topic.
>>>
>>> Alex
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 3
Date: Fri, 6 Feb 2015 08:54:12 +1100
From: Joel Nothman <joel.noth...@gmail.com>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: scikit-learn-general <scikit-learn-general@lists.sourceforge.net>
Message-ID:
        <caakaflvu_-3krs31cunfbu3rd1sowtqun33rvqnyzvirrnl...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

With cv=5, only the training sets should overlap. Is this adjustment still
appropriate?

On 6 February 2015 at 06:44, Michael Eickenberg <
michael.eickenb...@gmail.com> wrote:

> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction
> to variance reduction incurred by the overlapping nature of the folds. the
> bootstrap book contains more info on how to calculate these for different
> cases of splitting.
>
> hth,
> michael
>
>
> On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am wondering why the standard deviation of the accuracy estimate is
>> multiplied by 2 in the example on
>> http://scikit-learn.org/stable/modules/cross_validation.html; it would
>> be nice if someone could explain it to me.
>>
>> The relevant excerpt from the page linked above:
>>
>> >>> clf = svm.SVC(kernel='linear', C=1)
>> >>> scores = cross_validation.cross_val_score(
>> ... clf, iris.data, iris.target, cv=5)
>> ...
>> >>> scores
>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>> The mean score and the standard deviation of the score estimate are hence
>> given by:
>> >>>
>> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
>> 2))
>> Accuracy: 0.98 (+/- 0.03)
>>
>>
>> Best,
>> Sebastian
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 4
Date: Thu, 5 Feb 2015 17:11:00 -0500
From: Kyle Kastner <kastnerk...@gmail.com>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: scikit-learn-general@lists.sourceforge.net
Message-ID:
        <CAGNZ19BYpHQS1zrKLAShgGEF=echmkw5erwwulxodm6pp57...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Could it also be accounting for +- ? Standard deviation is one sided right?

On Thu, Feb 5, 2015 at 4:54 PM, Joel Nothman <joel.noth...@gmail.com> wrote:

> With cv=5, only the training sets should overlap. Is this adjustment still
> appropriate?
>
> On 6 February 2015 at 06:44, Michael Eickenberg <
> michael.eickenb...@gmail.com> wrote:
>
>> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction
>> to variance reduction incurred by the overlapping nature of the folds. the
>> bootstrap book contains more info on how to calculate these for different
>> cases of splitting.
>>
>> hth,
>> michael
>>
>>
>> On Thursday, February 5, 2015, Sebastian Raschka <se.rasc...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I am wondering why the standard deviation of the accuracy estimate is
>>> multiplied by 2 in the example on
>>> http://scikit-learn.org/stable/modules/cross_validation.html; it would
>>> be nice if someone could explain it to me.
>>>
>>> The relevant excerpt from the page linked above:
>>>
>>> >>> clf = svm.SVC(kernel='linear', C=1)
>>> >>> scores = cross_validation.cross_val_score(
>>> ... clf, iris.data, iris.target, cv=5)
>>> ...
>>> >>> scores
>>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>>> The mean score and the standard deviation of the score estimate are
>>> hence given by:
>>> >>>
>>> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
>>> 2))
>>> Accuracy: 0.98 (+/- 0.03)
>>>
>>>
>>> Best,
>>> Sebastian
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


End of Scikit-learn-general Digest, Vol 61, Issue 8
***************************************************

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Calculating standard deviation for k-fold cross

Reply via email to