Hello, I am new to Spark and I am evaluating its suitability for my machine learning tasks. I am using Spark v. 1.2.1. I would really appreciate if someone could provide any insight about the following two issues.
1. I'd like to try a "leave one out" approach for training my SVM, meaning that all but one data points are used for training. The example SVM classifier code on the Spark webpage has this: JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD(); JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L); training.cache(); JavaRDD<LabeledPoint> test = data.subtract(training); Is there a way to iterate over data and progressively remove each element in order to designate the rest of the dataset as training, instead of using a certain fraction of all the data for training (60% in the above example)? 2. Is there a way to choose and vary the parameters of the SVM? (kernel, cost, gamma…) Thank you! Natalia