zhengruifeng commented on issue #24648: [SPARK-27777][ML] Eliminate uncessary 
sliding job in AreaUnderCurve
URL: https://github.com/apache/spark/pull/24648#issuecomment-495463566
 
 
   @srowen  I made a detailed review on `ML.XXXEvaluator` & `MLLIB.XXXMetrics` 
recently and find another several places seems needing to be improved. 
   For example:
   1, all metrics in `MultilabelMetrics` & `MulticlassMetrics` can be computed 
on only one pass, however, in current impl each metric needs one pass.
   2, `ML.XXXEvaluator` only supports only one metric at once, which means at 
least one pass is needed for one metric. I think we can cache the 
`MLLIB.XXXMetrics` in the impl, and in the following calls, if the input 
dataset donot change, we can direct get the metric from cached 
`MLLIB.XXXMetrics` without accumlation on the input dataset.
   3, `MultiLabelClassificationEvalutaor` is missing now.
   4, in `BinaryClassificationMetrics`, to control the #Bins,  direct setting 
the #Partition in the sort stage seems more reasonable than current impl
   
   Would you mind if I open a umbrella ticket "Evaluator & Metrics 
improvements" to track above points and opened tickets on `sliding job` and 
`SSreg`?
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to