imatiach-msft commented on a change in pull request #17084: 
[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight 
column for binary classification evaluator
URL: https://github.com/apache/spark/pull/17084#discussion_r257892224
 
 

 ##########
 File path: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/BinaryClassificationMetrics.scala
 ##########
 @@ -146,11 +164,13 @@ class BinaryClassificationMetrics @Since("1.3.0") (
   private lazy val (
     cumulativeCounts: RDD[(Double, BinaryLabelCounter)],
     confusions: RDD[(Double, BinaryConfusionMatrix)]) = {
-    // Create a bin for each distinct score value, count positives and 
negatives within each bin,
-    // and then sort by score values in descending order.
-    val counts = scoreAndLabels.combineByKey(
-      createCombiner = (label: Double) => new BinaryLabelCounter(0L, 0L) += 
label,
-      mergeValue = (c: BinaryLabelCounter, label: Double) => c += label,
+    // Create a bin for each distinct score value, count weighted positives and
+    // negatives within each bin, and then sort by score values in descending 
order.
+    val counts = scoreLabelsWeight.combineByKey(
+      createCombiner = (labelAndWeight: (Double, Double)) =>
+        new BinaryLabelCounter(0.0, 0.0) += (labelAndWeight._1, 
labelAndWeight._2),
+      mergeValue = (c: BinaryLabelCounter, labelAndWeight: (Double, Double)) =>
+        c += (labelAndWeight._1, labelAndWeight._2),
 
 Review comment:
   oh, there is a += operator overloaded for (Double, Double), see 
BinaryLabelCounter.scala (I added it in this PR - in addition to the += Label 
operator overload that already existed there).  Maybe having an operator 
overload that works on a tuple is confusing to add?  Perhaps I need to add an 
explicit method instead - but this seems cleaner and more in line with the 
existing += operator overload that takes the label as a Double.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to