srowen edited a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489326110 I still don't see the argument that the first or last is better. They are simply the endpoints of the range of scores within the bin. As the number of bins increases, the range is smaller. If you are worried about this difference, you need more bins. Your argument cuts two ways: having a slightly higher threshold than desired can cause as many problems as slightly smaller. What would be possibly better here is to compute the score of a bin as a weighted average of its elements. That would be OK though you'd have to change many tests. I think the current implementation is designed to match scikit (?)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
