On Fri, Feb 6, 2015 at 2:08 PM, Imran Akbar <im...@infoscoutinc.com> wrote: > Hi, > > I've got the following code that's almost complete, but I have 2 questions: > > 1) Once I've computed the TF-IDF vector, how do I compute the vector for > each string to feed into the LabeledPoint? >
If I understand your code correctly, you want to map string labels into double labels in {0.0, 1.0, ..., } to fit NaiveBayes. You can do that by collecting all distinct labels and create a map from labels to indices. (We will add a transformer to make this step easier.) > 2) Does MLLib provide any methods to evaluate the model's precision, > recall, F-score, etc? All I saw in the documentation was"MLlib supports > common evaluation metrics for binary classification (not available > inPySpark). This includes precision, recall, F-measure". What about other > classifiers besides binary, and from PySpark? > We have evaluation metrics for multiclass classification. But unfortunately they are not available in Python. I created a JIRA (SPARK-5694) to track it. > thanks, > imran --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org