SOLVED. The rawPredictionCol input to BinaryClassificationEvaluator is a vector specifying the prediction confidence for each class. Since we are talking about binary classification the prediction for class 0 is simply (1 - y_pred), where y_pred is the prediction for class 1.
So this can be applied to ALS for boolean ratings as follows: # First, train model and create predictions from pyspark.ml.recommendation import ALS model = ALS().fit(trainingdata) predictions = model.transform(validationdata) # Vectorize predictions to prep for evaluation from pyspark.mllib.linalg import Vectors, VectorUDT predictionvectorizer = udf(lambda x: Vectors.dense(1.0 - x, x), returnType=VectorUDT()) vectorizedpredictions = predictions.withColumn("rawPrediction",predictionvectorizer("prediction")) # Now evaluate predictions from pyspark.ml.evaluation import BinaryClassificationEvaluator evaluator = BinaryClassificationEvaluator() evaluator.evaluate(vectorizedpredictions) On Fri, Jun 24, 2016 at 10:42 AM, apu <apumishra...@gmail.com> wrote: > pyspark.ml.evaluation.BinaryClassificationEvaluator expects > predictions in the form of vectors (apparently designating confidence > intervals), as described in > https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator > > However, I am trying to evaluate ALS predictions, which are given as > single point predictions without confidence intervals. Therefore, > predictions are given as floats rather than vectors. > > How can I evaluate these using ml's BinaryClassificationEvaluator? > > (Note that this is a different function from mllib's > BinaryClassificationMetrics.) > > Thanks! > > Apu --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org