Re: How can I use pyspark.ml.evaluation.BinaryClassificationEvaluator with point predictions instead of confidence intervals?

apu Fri, 24 Jun 2016 16:20:19 -0700

SOLVED.

The rawPredictionCol input to BinaryClassificationEvaluator is a
vector specifying the prediction confidence for each class. Since we
are talking about binary classification the prediction for class 0 is
simply (1 - y_pred), where y_pred is the prediction for class 1.


So this can be applied to ALS for boolean ratings as follows:

# First, train model and create predictions
from pyspark.ml.recommendation import ALS
model = ALS().fit(trainingdata)
predictions = model.transform(validationdata)

# Vectorize predictions to prep for evaluation
from pyspark.mllib.linalg import Vectors, VectorUDT
predictionvectorizer = udf(lambda x: Vectors.dense(1.0 - x, x),
returnType=VectorUDT())
vectorizedpredictions =
predictions.withColumn("rawPrediction",predictionvectorizer("prediction"))

# Now evaluate predictions
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluator = BinaryClassificationEvaluator()
evaluator.evaluate(vectorizedpredictions)

On Fri, Jun 24, 2016 at 10:42 AM, apu <apumishra...@gmail.com> wrote:
> pyspark.ml.evaluation.BinaryClassificationEvaluator expects
> predictions in the form of vectors (apparently designating confidence
> intervals), as described in
> https://spark.apache.org/docs/latest/api/python/pyspark.ml.html#pyspark.ml.evaluation.BinaryClassificationEvaluator
>
> However, I am trying to evaluate ALS predictions, which are given as
> single point predictions without confidence intervals. Therefore,
> predictions are given as floats rather than vectors.
>
> How can I evaluate these using ml's BinaryClassificationEvaluator?
>
> (Note that this is a different function from mllib's
> BinaryClassificationMetrics.)
>
> Thanks!
>
> Apu

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How can I use pyspark.ml.evaluation.BinaryClassificationEvaluator with point predictions instead of confidence intervals?

Reply via email to