Github user imatiach-msft commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17086#discussion_r231227296
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala 
---
    @@ -27,10 +27,17 @@ import org.apache.spark.sql.DataFrame
     /**
      * Evaluator for multiclass classification.
      *
    - * @param predictionAndLabels an RDD of (prediction, label) pairs.
    + * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) 
or
    + *                         (prediction, label) pairs.
      */
     @Since("1.1.0")
    -class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, 
Double)]) {
    +class MulticlassMetrics @Since("3.0.0") (predAndLabelsWithOptWeight: 
RDD[_]) {
    --- End diff --
    
    "I am not sure what to do about the DataFrame issue though", ah, I think I 
see your concern.
    But, isn't this dataframe constructor private anyway, so it can't be used 
by anyone outside mllib:
    
      private[mllib] def this(predictionAndLabels: DataFrame) =
        this(predictionAndLabels.rdd.map(r => (r.getDouble(0), r.getDouble(1))))
    
    I only modified the RDD part because that is what is used by the ML 
evaluator and it is what users outside spark can access.  This is to add weight 
column for the evaluators.
    
    However, even if we wanted to add weight column support for the private 
API, I'm unsure about how to add this.  Should I just check if there are 3 
columns or two, and if there are 3 use the third one as the weight column?  I 
guess I am on the fence about this, I could change it but I don't think it is 
absolutely necessary, since it's not used anywhere outside spark MLLIB anyway.
    
    Actually, this constructor is a bit weird, it looks like it was added as 
part of this PR:
    https://github.com/apache/spark/pull/6011/files
    It is only used here in the python API:
    
https://github.com/apache/spark/pull/6011/files#diff-443f766289f8090078531c3e1a1d6027R186
    But I don't see why we couldn't just get the rdd there and remove the 
private constructor altogether (?)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to