> Furthermore, a training, validation, and testing set should be used when
> setting up
> parameters.

Usually, it’s better to use a train set and separate test set, and do model 
selection via k-fold on the training set. Then, you do the final model 
estimation on the test set that you haven’t touched before. I often use 
“training, validation, and testing “ approach as well, though, especially when 
working with large datasets and for early stopping on neural nets.

Best,
Sebastian


> On Jan 26, 2017, at 1:19 PM, Raga Markely <raga.mark...@gmail.com> wrote:
> 
> Thank you, Guillaume.
> 
> 1. I agree with you - that's what I have been learning and makes sense.. I 
> was a bit surprised when I read the paper today..
> 
> 2. Ah.. thank you.. I got to change my glasses :P 
> 
> Best,
> Raga
> 
> Guillaume Lemaître g.lemaitre58 at gmail.com 
> Thu Jan 26 12:05:12 EST 2017
> 
>       • Previous message (by thread): [scikit-learn] Scores in Cross 
> Validation
>       • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> 1. You should not evaluate an estimator on the data which have been used to
> train it.
> Usually, you try to minimize the classification or loss using those data
> and fit them as
> good as possible. Evaluating on an unseen testing set will give you an idea
> how good
> your estimator was able to generalize to your problem during the training.
> Furthermore, a training, validation, and testing set should be used when
> setting up
> parameters. Validation will be used to set the parameters and the testing
> will be used
> to evaluate your best estimator.
> 
> That is why, when using the GridSearchCV, fit will train the estimator
> using a training
> and validation test (using a given CV startegies). Finally, predict will be
> performed on
> another unseen testing set.
> 
> The bottom line is that using training data to select parameters will not
> ensure that you
> are selecting the best parameters for your problems.
> 
> 2. The function is call in _fit_and_score, l. 260 and 263 for instance.
> 
> On 26 January 2017 at 17:02, Raga Markely <
> raga.markely at gmail.com
> > wrote:
> 
> >
>  Hello,
> 
> >
> >
>  I have 2 questions regarding cross_val_score.
> 
> >
>  1. Do the scores returned by cross_val_score correspond to only the test
> 
> >
>  set or the whole data set (training and test sets)?
> 
> >
>  I tried to look at the source code, and it looks like it returns the score
> 
> >
>  of only the test set (line 145: "return_train_score=False") - I am not sure
> 
> >
>  if I am reading the codes properly, though..
> 
> > https://github.com/scikit-learn/scikit-learn/blob/14031f6/
> >
>  sklearn/model_selection/_validation.py#L36
> 
> >
>  I came across the paper below and the authors use the score of the whole
> 
> >
>  dataset when the author performs repeated nested loop, grid search cv,
> 
> >
>  etc.. e.g. see algorithm 1 (line 1c) and 2 (line 2d) on page 3.
> 
> > https://jcheminf.springeropen.com/articles/10.1186/1758-2946-6-10
> >
>  I wonder what's the pros and cons of using the accuracy score of the whole
> 
> >
>  dataset vs just the test set.. any thoughts?
> 
> >
> >
>  2. On line 283 of the cross_val_score source code, there is a function
> 
> >
>  _score. However, I can't find where this function is called. Could you let
> 
> >
>  me know where this function is called?
> 
> >
> >
>  Thank you very much!
> 
> >
>  Raga
> 
> >
> >
>  _______________________________________________
> 
> >
>  scikit-learn mailing list
> 
> > scikit-learn at python.org
> > https://mail.python.org/mailman/listinfo/scikit-learn
> >
> >
> 
> 
> -- 
> Guillaume Lemaitre
> INRIA Saclay - Ile-de-France
> Equipe PARIETAL
> 
> guillaume.lemaitre at inria.f <guillaume.lemaitre at inria.fr
> >r ---
> 
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to