srowen commented on a change in pull request #17084: [SPARK-24103][ML][MLLIB]
ML Evaluators should use weight column - added weight column for binary
classification evaluator
URL: https://github.com/apache/spark/pull/17084#discussion_r259648639
##########
File path: python/pyspark/mllib/evaluation.py
##########
@@ -40,16 +40,28 @@ class BinaryClassificationMetrics(JavaModelWrapper):
>>> metrics.areaUnderPR
0.83...
>>> metrics.unpersist()
+ >>> scoreAndLabelsWithOptWeight = sc.parallelize([
+ ... (0.1, 0.0, 1.0), (0.1, 1.0, 0.4), (0.4, 0.0, 0.2), (0.6, 0.0,
0.6), (0.6, 1.0, 0.9),
+ ... (0.6, 1.0, 0.5), (0.8, 1.0, 0.7)], 2)
+ >>> metrics = BinaryClassificationMetrics(scoreAndLabelsWithOptWeight)
+ >>> metrics.areaUnderROC
+ 0.70...
+ >>> metrics.areaUnderPR
+ 0.83...
.. versionadded:: 1.4.0
"""
- def __init__(self, scoreAndLabels):
- sc = scoreAndLabels.ctx
+ def __init__(self, scoreAndLabelsWithOptWeight):
Review comment:
Ah, this might be a problem in Python in a way it isn't in Scala. The name
of the parameter matters. I think you can no longer call this with
`...(scoreAndLabels=...)`. It's surprising if someone would do that, but,
possible. I think at the end, maybe we don't change this parameter name to be
safe, even though it now also accepts weights. That's not so bad.
And then I wonder, maybe change back the name of the parameter in Scala too?
It's really scoreAndLabels(AndWeightsThoughThat'sMoreOfADetail)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]