does spark ML have some thing like createDataPartition() in R caret package ?

2015-11-13 Thread Andy Davidson
In R, its easy to split a data set into training, crossValidation, and test set. Is there something like this in spark.ml? I am using python of now. My real problem is I want to randomly select a relatively small data set to do some initial data exploration. Its not clear to me how using spark I

Re: does spark ML have some thing like createDataPartition() in R caret package ?

2015-11-13 Thread Sonal Goyal
The RDD has a takeSample method where you can supply the flag for replacement or not as well as the fraction to sample. On Nov 14, 2015 2:51 AM, "Andy Davidson" wrote: > In R, its easy to split a data set into training, crossValidation, and > test set. Is there