Github user feynmanliang commented on the pull request:
https://github.com/apache/spark/pull/7337#issuecomment-120495073
Ah, I overlooked your point about when `1 - trainRatio > 1/numFolds` we
will have overlapping folds. Also, I realized that what I'm proposing (adding a
`trainRatio` for each fold) contradicts what
[Wikipedia](https://en.wikipedia.org/wiki/Cross-validation_(statistics)#k-fold_cross-validation)
defines for k-fold validation. Thanks for pushing back on this.
I agree with you that `TrainValidatorSplit` and `CrossValidator` have two
different functionality and should be separate classes. I like the idea of both
wrapping a common (perhaps more confusing) implementation with both `numFolds`
and `trainRatio`; it differentiates the concepts in the public API but shares
code in the implementation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]