Hi,

You need to convert your text to vector space model: 
http://en.wikipedia.org/wiki/Vector_space_model
and then pass it to SVM. As far as I know, in previous versions of MLlib there 
was a special class for doing this: 
https://github.com/amplab/MLI/blob/master/src/main/scala/feat/NGrams.scala. It 
is not compatible with Spark 1.0.
I wonder why MLLib folks didn't include it in newer versions of Spark.

As a workaround, you could use a separate tool to convert your data to LibSVM 
format http://stats.stackexchange.com/questions/61328/libsvm-data-format, and 
then load it with MLUtils.loadLibSVMFile. For example, you could use Weka 
http://www.cs.waikato.ac.nz/ml/weka/  (it has friendly UI but doesn't handle 
big datasets) to convert your file.

Best regards, Alexander

-----Original Message-----
From: lmk [mailto:lakshmi.muralikrish...@gmail.com] 
Sent: Tuesday, June 24, 2014 3:17 PM
To: u...@spark.incubator.apache.org
Subject: Prediction using Classification with text attributes in Apache Spark 
MLLib

Hi,
I am trying to predict an attribute with binary value (Yes/No) using SVM.
All my attributes which belong to the training set are text attributes. 
I understand that I have to convert my outcome as double (0.0/1.0). But I donot 
understand how to deal with my explanatory variables which are also text.
Please let me know how I can do this.

Thanks.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Prediction-using-Classification-with-text-attributes-in-Apache-Spark-MLLib-tp8166.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to