In R, its easy to split a data set into training, crossValidation, and test
set. Is there something like this in spark.ml? I am using python of now.
My real problem is I want to randomly select a relatively small data set to
do some initial data exploration. Its not clear to me how using spark I
The RDD has a takeSample method where you can supply the flag for
replacement or not as well as the fraction to sample.
On Nov 14, 2015 2:51 AM, "Andy Davidson"
wrote:
> In R, its easy to split a data set into training, crossValidation, and
> test set. Is there