SVM questions (data splitting, SVM parameters)

Natalia Connolly Wed, 11 Mar 2015 08:19:22 -0700

Hello,

   I am new to Spark and I am evaluating its suitability for my machine
learning tasks.  I am using Spark v. 1.2.1.  I would really appreciate if
someone could provide any insight about the following two issues.


 1.  I'd like to try a "leave one out" approach for training my SVM,
meaning that all but one data points are used for training.  The example
SVM classifier code on the Spark webpage has this:

JavaRDD<LabeledPoint> data = MLUtils.loadLibSVMFile(sc, path).toJavaRDD();

JavaRDD<LabeledPoint> training = data.sample(false, 0.6, 11L);
training.cache();
JavaRDD<LabeledPoint> test = data.subtract(training);

  Is there a way to iterate over data and progressively remove each element
in order to designate the rest of the dataset as training, instead of using
a certain fraction of all the data for training (60% in the above example)?


2.  Is there a way to choose and vary the parameters of the SVM?  (kernel,
cost, gamma…)

    Thank you!

    Natalia

SVM questions (data splitting, SVM parameters)

Reply via email to