Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/20028 Thanks for the comments @zhengruifeng @felixcheung It's been nearly 8 months and it took me a while to recall what this PR does. While the PR did provide some improvement for the current API, I wonder if it lays a good foundation for an extensible and flexible `Evaluator` framework for Spark ML. The current design is not quite user-friendly as it asks users to understand the concept of `Metrics` (BinaryClassificationMetrics, MultiClassClassificationMetrics, RegressionMetrics) which are primarily for internal calculation, and it implies that all the indicators in a `Metrics` can be calculated in one pass of the DataFrame, which creates some difficulty when we add extra indicators in the Metric which cannot be calculated with other indicators. IMO, API wise, ideally we should allow users to specify any combination of the metrics that they want to add to the `Evaluator`, then the `Evaluator` needs to figure out the best way to efficiently calculate the metrics. Following are the concrete suggestions: 1. Evaluator API: ``` ClassificationEvaluator { def setPredictionCol(value: String): this.type def setLabelCol(value: String): this.type // kept for back-ward compatibility and Cross validation def setMetricName(value: String): this.type // kept for back-ward compatibility and Cross validation override def evaluate(dataset: Dataset[_]): Double // calculate multiple metrics, will try to optimize calculation internally override def getMetrics(dataset: Dataset[_], metrics: Array[String]): Map[String, Any] // or wrap it with customized class } val ce = new ClassificationEvaluator().setLabelCol("x").setPredictionCol("y") metrics = ce.getMetrics(dataframe, Array(Classification.truePositiveRateByLabel, BinaryClassification.areaUnderROC)) println(metrics) ``` We can basically merge BinaryClassificationEvaluator and MultiClassificationEvaluator. Similarly we can have `RegressionEvaluator` and `ClusteringEvaluator`, separating those because we may need to provide different setters in each. 2. Summy classes may invoke Evaluator internally. @felixcheung , I'm not sure if this can get a shepherd and review bandwidth for the next release. I don't want to just update version numbers every a few months.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org