Re: [Scikit-learn-general] training score in GridSearchCV?

Joel Nothman Sat, 03 May 2014 15:34:17 -0700

there is thought that in the future, there are additional fields that will
be added. Many, less likely.


I'm really against extending a namedtuple with new fields and breaking any
tuple-style iteration.

Previously when discussing such things, Olivier suggested that the most
flexible solution is a dict per validation. A dict per validation, or a
dict of arrays, can also easily be converted to a Pandas DataFrame.

It is now outdated, but this is one of my attempts to move towards a more
extensible model: https://github.com/scikit-learn/scikit-learn/pull/2079

- Joel


On 4 May 2014 03:45, Robert McGibbon <rmcgi...@gmail.com> wrote:

> So I guess the options for this are:
>
>   1. Do nothing -- don't add the training score to the return values
>   2. Add the training score to the _CVScoreTuple, and possibly other
> fields like the training time (ala #1742)
>   3. Get rid of _CVScoreTuple and use a dict instead.
>
> IMHO option 2 is the best, unless there is thought that in the future,
> there are many additional fields that will be added, in which case option 3
> is the best since once it's a dict, adding new fields doesn't break
> backward compatibility.
>
> -Robert
>
>
>
> On Sat, May 3, 2014 at 8:19 AM, Andy <t3k...@gmail.com> wrote:
>
>>  Btw there was a branch by me doing exactly that:
>> https://github.com/scikit-learn/scikit-learn/pull/1742
>> I don't really remember what the reason not to merge it was (it is now
>> hopelessly out of data I think).
>>
>>
>>
>> On 05/02/2014 08:17 AM, Robert McGibbon wrote:
>>
>>  > There have been previous attempts to incorporate training score, but
>> there's a general open question of how best to
>> > return Gird Search results: The current format cv_scores_ is not really
>> extensible, which seems to have stalled many of
>> > these issues. Input on this issue is welcome. Otherwise, for the
>> moment, you will have to roll your own implementation
>> > (and I should note that _fit_and_score is a fairly recent invention).
>>
>>  What about adding more fields to the _CVScoreTuple namedtuple
>> (GridSearchCV.grid_scores_ is a list of these namedtuples)? If things are
>> added at the end of the list, it should have a pretty small chance of
>> breaking backward compatibility. The current field names (`parameters`,
>> `mean_validation_score`, `cv_validation_scores`)  are quite specific, so
>> for example adding `cv_train_scores` could be an option.
>>
>>  I'm not too aware of the history of the project or what has been tried
>> previously on this issue, so appologies if this is obviously incorrect.
>>
>>  FWIW, I put together the code + tests for this change:
>>
>> https://github.com/rmcgibbo/scikit-learn/compare/scikit-learn:master...rmcgibbo:grid-search-train-error
>> Happy to file a PR if this is worthwhile for others.
>>
>>  -Robert
>>
>>
>> On Thu, May 1, 2014 at 10:10 PM, Joel Nothman <joel.noth...@gmail.com>wrote:
>>
>>> There have been previous attempts to incorporate training score, but
>>> there's a general open question of how best to return Gird Search results:
>>> The current format cv_scores_ is not really extensible, which seems to have
>>> stalled many of these issues. Input on this issue is welcome. Otherwise,
>>> for the moment, you will have to roll your own implementation (and I should
>>> note that _fit_and_score is a fairly recent invention).
>>>
>>>
>>>  On 2 May 2014 13:34, Robert McGibbon <rmcgi...@gmail.com> wrote:
>>>
>>>>   Hi all,
>>>>
>>>>  Is there any to get the score on the training data for each parameter
>>>> set (and each fold) when running GridSearchCV? While I haven't looked too
>>>> closely at the code, it appears that 
>>>> BaseSearchCV<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/grid_search.py#L378>uses
>>>>  the
>>>> _fit_and_score<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L1118>
>>>>  method,
>>>> which does have the ability to calculated and return scores on the training
>>>> data, but that this functionality isn't exposed in GridSearchCV.
>>>>
>>>>  The use case for this would to compare training and test error (ala
>>>> the classic training error and test error vs. model complexity 
>>>> plot<http://link.springer.com/protocol/10.1007%2F978-1-60327-429-6_15/fulltext.html#Fig3_15>
>>>> )
>>>>
>>>>  -Robert
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>>> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>>>> unparalleled scalability from the best Selenium testing platform
>>>> available.
>>>> Simple to use. Nothing to install. Get started now for free."
>>>> http://p.sf.net/sfu/SauceLabs
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>>> unparalleled scalability from the best Selenium testing platform
>>> available.
>>> Simple to use. Nothing to install. Get started now for free."
>>> http://p.sf.net/sfu/SauceLabs
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>> unparalleled scalability from the best Selenium testing platform available.
>> Simple to use. Nothing to install. Get started now for 
>> free."http://p.sf.net/sfu/SauceLabs
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>> unparalleled scalability from the best Selenium testing platform
>> available.
>> Simple to use. Nothing to install. Get started now for free."
>> http://p.sf.net/sfu/SauceLabs
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
> unparalleled scalability from the best Selenium testing platform available.
> Simple to use. Nothing to install. Get started now for free."
> http://p.sf.net/sfu/SauceLabs
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] training score in GridSearchCV?

Reply via email to