Andreas Muller suggested GroupIndependentKFold.

The problem with adding a parameter (such as stratified) to the existing
LeaveKLabelOut is that it might be misleading in the sense that:
(i) here we might we don't care about the number of labels left out
(ii) The number of labels left out might vary between folds.

What do you think?

Cheers,

Jean

2015-03-24 16:26 GMT+00:00 Jean K <jean.kossa...@gmail.com>:

> Hi,
>
> Yes Michael, that's exactly what I want.
>
> Basically I don't care about the number of Labels left out, I just want K
> (approximately) equilibrated folds, where the same label does not appear in
> both training and testing (therefore the number of labels left out might
> vary for each fold).
>
> Indeed the LFW dataset is a good example. In that case, training and
> testing and the same label (i.e. the same person's face) might lead to
> overfitting.
>
> Jean
>
> 2015-03-24 16:20 GMT+00:00 Michael Eickenberg <
> michael.eickenb...@gmail.com>:
>
>> looks like the difference is that it can group several labels into one
>> fold.
>>
>> not everybody works with "subjects" - the proper name would contain the
>> word Label or Group, or it should be incorporated in a LeaveLabelsOut which
>> could have several modes, among which LeaveOneLabelOut and "balanced" mode
>> which is the present contribution.
>>
>> can be useful if labels strongly imbalanced or if validation must be able
>> to handle several labels.
>>
>> the LFW dataset would be such an example.
>>
>> Jean, does what I said make sense?
>>
>> Michael
>>
>>
>> On Tuesday, March 24, 2015, Alexandre Gramfort <
>> alexandre.gramf...@telecom-paristech.fr> wrote:
>>
>>> hi jean,
>>>
>>> how different is it from
>>>
>>> http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LeaveOneLabelOut.html
>>> ?
>>>
>>> A
>>>
>>> On Tue, Mar 24, 2015 at 4:49 PM, Jean K <jean.kossa...@gmail.com> wrote:
>>> > Hi all,
>>> >
>>> > I recently needed to perform some subject independent KFold
>>> > cross-validation. To my knowledge this feature isn't in the
>>> scikit-learn
>>> > yet, so I created a pull-request with a simple implementation.
>>> >
>>> > It is similar the original Fold except that it takes as parameter an
>>> array
>>> > of subjects (similarly to the StratifiedKFold that takes an array of
>>> labels
>>> > as a parameter) and separate these into K Folds, approximately
>>> equilibrated,
>>> > where each subject appears only in one fold.
>>> >
>>> > Do you think this would be useful?
>>> >
>>> > Cheers,
>>> >
>>> > Jean
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming The Go Parallel Website,
>>> > sponsored
>>> > by Intel and developed in partnership with Slashdot Media, is your hub
>>> for
>>> > all
>>> > things parallel software development, from weekly thought leadership
>>> blogs
>>> > to
>>> > news, videos, case studies, tutorials and more. Take a look and join
>>> the
>>> > conversation now. http://goparallel.sourceforge.net/
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-general@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website,
>>> sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub
>>> for all
>>> things parallel software development, from weekly thought leadership
>>> blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to