This is a very common calculation, you will find it at all of these places (but 
only with one standard deviation):
http://scikit-learn.org/stable/auto_examples/randomized_search.html
http://nbviewer.ipython.org/github/gmonce/scikit-learn-book/blob/master/Chapter%202%20-%20Supervised%20Learning%20-%20Image%20Recognition%20with%20Support%20Vector%20Machines.ipynb
http://youtu.be/iFkRt3BCctg?t=33m25s

I would presume that standard deviation is multiplied by two because the author 
of the example wanted to create confidence intervals based on two standard 
deviations. Technically, if they multiplied it by 1.96, then they would 
approximate the famous 95% confidence interval better, but 2 standard 
deviations is often used for simplicity.

http://en.wikipedia.org/wiki/1.96

Best,
Jason



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] 
Sent: Thursday, February 05, 2015 4:11 PM
To: [email protected]
Subject: Scikit-learn-general Digest, Vol 61, Issue 8

Send Scikit-learn-general mailing list submissions to
        [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
        [email protected]

You can reach the person managing the list at
        [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."


Today's Topics:

   1. Re: Calculating standard deviation for k-fold cross
      validation estimate (Michael Eickenberg)
   2. Re: GSoC2015 topics (Joel Nothman)
   3. Re: Calculating standard deviation for k-fold cross
      validation estimate (Joel Nothman)
   4. Re: Calculating standard deviation for k-fold cross
      validation estimate (Kyle Kastner)


----------------------------------------------------------------------

Message: 1
Date: Thu, 5 Feb 2015 20:44:16 +0100
From: Michael Eickenberg <[email protected]>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: "[email protected]"
        <[email protected]>
Message-ID:
        <cadxjn660qzvxs+ui+cskezdzwskqh_9gagtl-opqwivha-g...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

this is most probably due to the fact that 2 = sqrt(5 - 1), a correction to
variance reduction incurred by the overlapping nature of the folds. the
bootstrap book contains more info on how to calculate these for different
cases of splitting.

hth,
michael

On Thursday, February 5, 2015, Sebastian Raschka <[email protected]>
wrote:

> Hi,
>
> I am wondering why the standard deviation of the accuracy estimate is
> multiplied by 2 in the example on
> http://scikit-learn.org/stable/modules/cross_validation.html; it would be
> nice if someone could explain it to me.
>
> The relevant excerpt from the page linked above:
>
> >>> clf = svm.SVC(kernel='linear', C=1)
> >>> scores = cross_validation.cross_val_score(
> ... clf, iris.data, iris.target, cv=5)
> ...
> >>> scores
> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
> The mean score and the standard deviation of the score estimate are hence
> given by:
> >>>
> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
> 2))
> Accuracy: 0.98 (+/- 0.03)
>
>
> Best,
> Sebastian
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected] <javascript:;>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Fri, 6 Feb 2015 08:52:31 +1100
From: Joel Nothman <[email protected]>
Subject: Re: [Scikit-learn-general] GSoC2015 topics
To: scikit-learn-general <[email protected]>
Message-ID:
        <caakaflw8xun0yp_wgwn-x8wgwbtybqvny38-pqeej8e2hd-...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

> I think adding partial_fit functions in general to as many algorithms as
possible would be nice

Which could be a project in itself, for someone open to breadth rather than
depth.

On 6 February 2015 at 06:43, Kyle Kastner <[email protected]> wrote:

> IncrementalPCA is done (have to add randomized SVD solver but that should
> be simple), but I am sure there are other low rank methods which need a
> partial_fit . I think adding partial_fit functions in general to as many
> algorithms as possible would be nice
>
> Kyle
>
> On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <[email protected]
> > wrote:
>
>> Is Online low rank factorisation still a vaild idea for this year? As it
>> was in the last years idea list.
>>
>> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort <
>> [email protected]> wrote:
>>
>>> > I just looked at the list from last year, and what seems most relevant
>>> > still is GMMs,
>>> > and possibly the coordinate descent solvers (Alex maybe you can say
>>> what
>>> > is left there or
>>> > if with the SAG we are happy now?)
>>>
>>> there is work coming in coordinate descent and SAG is almost done.
>>> I don't think it's worth investing a gsoc on this topic.
>>>
>>> Alex
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 3
Date: Fri, 6 Feb 2015 08:54:12 +1100
From: Joel Nothman <[email protected]>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: scikit-learn-general <[email protected]>
Message-ID:
        <caakaflvu_-3krs31cunfbu3rd1sowtqun33rvqnyzvirrnl...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

With cv=5, only the training sets should overlap. Is this adjustment still
appropriate?

On 6 February 2015 at 06:44, Michael Eickenberg <
[email protected]> wrote:

> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction
> to variance reduction incurred by the overlapping nature of the folds. the
> bootstrap book contains more info on how to calculate these for different
> cases of splitting.
>
> hth,
> michael
>
>
> On Thursday, February 5, 2015, Sebastian Raschka <[email protected]>
> wrote:
>
>> Hi,
>>
>> I am wondering why the standard deviation of the accuracy estimate is
>> multiplied by 2 in the example on
>> http://scikit-learn.org/stable/modules/cross_validation.html; it would
>> be nice if someone could explain it to me.
>>
>> The relevant excerpt from the page linked above:
>>
>> >>> clf = svm.SVC(kernel='linear', C=1)
>> >>> scores = cross_validation.cross_val_score(
>> ... clf, iris.data, iris.target, cv=5)
>> ...
>> >>> scores
>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>> The mean score and the standard deviation of the score estimate are hence
>> given by:
>> >>>
>> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
>> 2))
>> Accuracy: 0.98 (+/- 0.03)
>>
>>
>> Best,
>> Sebastian
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 4
Date: Thu, 5 Feb 2015 17:11:00 -0500
From: Kyle Kastner <[email protected]>
Subject: Re: [Scikit-learn-general] Calculating standard deviation for
        k-fold cross validation estimate
To: [email protected]
Message-ID:
        <CAGNZ19BYpHQS1zrKLAShgGEF=echmkw5erwwulxodm6pp57...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Could it also be accounting for +- ? Standard deviation is one sided right?

On Thu, Feb 5, 2015 at 4:54 PM, Joel Nothman <[email protected]> wrote:

> With cv=5, only the training sets should overlap. Is this adjustment still
> appropriate?
>
> On 6 February 2015 at 06:44, Michael Eickenberg <
> [email protected]> wrote:
>
>> this is most probably due to the fact that 2 = sqrt(5 - 1), a correction
>> to variance reduction incurred by the overlapping nature of the folds. the
>> bootstrap book contains more info on how to calculate these for different
>> cases of splitting.
>>
>> hth,
>> michael
>>
>>
>> On Thursday, February 5, 2015, Sebastian Raschka <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I am wondering why the standard deviation of the accuracy estimate is
>>> multiplied by 2 in the example on
>>> http://scikit-learn.org/stable/modules/cross_validation.html; it would
>>> be nice if someone could explain it to me.
>>>
>>> The relevant excerpt from the page linked above:
>>>
>>> >>> clf = svm.SVC(kernel='linear', C=1)
>>> >>> scores = cross_validation.cross_val_score(
>>> ... clf, iris.data, iris.target, cv=5)
>>> ...
>>> >>> scores
>>> array([ 0.96..., 1. ..., 0.96..., 0.96..., 1. ])
>>> The mean score and the standard deviation of the score estimate are
>>> hence given by:
>>> >>>
>>> >>> print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() *
>>> 2))
>>> Accuracy: 0.98 (+/- 0.03)
>>>
>>>
>>> Best,
>>> Sebastian
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

------------------------------

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


End of Scikit-learn-general Digest, Vol 61, Issue 8
***************************************************

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to