> Thus it seems to me that the function that would really need to be
> replaced whould be cross_val_score, but it is a bit trivial to replace:
>
>      estimator.fit(X_train, y_train).score(X_test, y_test)
>
> A ShuffleSplit can be used inside this in combination of a GridSearch to
> do parameter selection with only one fold. Indeed, inside the train data,
> there is seldom a predined test and train sub group.
I'm not sure if I understood your solution.
> I am actually not sure that I have understood the usecase that we are
> discussing.
>
Maybe when I said "many datasets" I was exaggerating a bit.
At the moment, there I two datasets I have in mind,
CIFAR-10 and the Pascal VOC datasets.
These are split into "train" "val" and "test".
Sometimes results only on the "val" dataset are reported, as
the "test" set is unavailable to the users.
This is indeed over fitting and not best practice, but people
do it nevertheless.

I agree with you, I wouldn't want to break any of the beautiful
sklearn api for this use case. I just thought it would be nice
if I could do it in a more simple way.

Cheers,
Andy

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to