[GitHub] [spark] srowen edited a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics

GitBox Sat, 04 May 2019 06:16:28 -0700

srowen edited a comment on issue #24470: [SPARK-27577][MLlib] Correct 
thresholds downsampled in BinaryClassificationMetrics
URL: https://github.com/apache/spark/pull/24470#issuecomment-489326110
 
 
   I still don't see the argument that the first or last is better. They are 
simply the endpoints of the range of scores within the bin. As the number of 
bins increases, the range is smaller. If you are worried about this difference, 
you need more bins. Your argument cuts two ways: having a slightly higher 
threshold than desired can cause as many problems as slightly smaller.
   
   What would be possibly better here is to compute the score of a bin as a 
weighted average of its elements. That would be OK though you'd have to change 
many tests. I think the current implementation is designed to match scikit (?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen edited a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics

Reply via email to