[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

srowen Tue, 06 Nov 2018 05:40:34 -0800

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17086#discussion_r231123729
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala 
---
    @@ -27,10 +27,17 @@ import org.apache.spark.sql.DataFrame
     /**
      * Evaluator for multiclass classification.
      *
    - * @param predictionAndLabels an RDD of (prediction, label) pairs.
    + * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) 
or
    + *                         (prediction, label) pairs.
      */
     @Since("1.1.0")
    -class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, 
Double)]) {
    +class MulticlassMetrics @Since("3.0.0") (predAndLabelsWithOptWeight: 
RDD[_]) {
    --- End diff --
    
    Oh, wait a sec, this changed the signature. I think you have to retain 
both. The `RDD[(Double, Double)]` constructor should stay, one way or the 
other, and add a new `RDD[(Double, Double, Double)]` constructor, with 
appropriate Since tags on each.
    
    Below there's a `DataFrame` constructor and I'm not sure how to handle 
that. It should also handle the case where there's a weight col, but I'm not 
sure how to do that cleanly. There can be a second argument like `hasWeightCol` 
but that's starting to feel hacky.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17086: [SPARK-24101][ML][MLLIB] ML Evaluators should use...

Reply via email to