Hi, You need to convert your text to vector space model: http://en.wikipedia.org/wiki/Vector_space_model and then pass it to SVM. As far as I know, in previous versions of MLlib there was a special class for doing this: https://github.com/amplab/MLI/blob/master/src/main/scala/feat/NGrams.scala. It is not compatible with Spark 1.0. I wonder why MLLib folks didn't include it in newer versions of Spark.
As a workaround, you could use a separate tool to convert your data to LibSVM format http://stats.stackexchange.com/questions/61328/libsvm-data-format, and then load it with MLUtils.loadLibSVMFile. For example, you could use Weka http://www.cs.waikato.ac.nz/ml/weka/ (it has friendly UI but doesn't handle big datasets) to convert your file. Best regards, Alexander -----Original Message----- From: lmk [mailto:lakshmi.muralikrish...@gmail.com] Sent: Tuesday, June 24, 2014 3:17 PM To: u...@spark.incubator.apache.org Subject: Prediction using Classification with text attributes in Apache Spark MLLib Hi, I am trying to predict an attribute with binary value (Yes/No) using SVM. All my attributes which belong to the training set are text attributes. I understand that I have to convert my outcome as double (0.0/1.0). But I donot understand how to deal with my explanatory variables which are also text. Please let me know how I can do this. Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166.html Sent from the Apache Spark User List mailing list archive at Nabble.com.