asolimando commented on pull request #32813:
URL: https://github.com/apache/spark/pull/32813#issuecomment-857172105


   @137alpha thanks for the detailed explanation, now it's clear what you meant 
(I have missed the example from "xujiajin". 
   
   What my comment meant was talking exclusively at cases revolving around (1), 
that is what I have worked on when I contributed the PR (performance 
improvement of classification tasks, and rule extraction from DecisionTrees and 
RandomForests).
   
   For (2) and (3) this indeed seems problematic, because we disregard the 
probability entirely.
   
   If possible, it would be great to either fix the current "optimization" by 
looking at more information than the class prediction (notably, the 
probability), or at least provide a user-facing parameter to control the 
behaviour, so who needs (2)/(3) can disable it, who is happy with just (1) can 
benefit from it.
   
   Regarding the documentation update, at the time it did not seem relevant, 
because the contribution seemed an internal optimization (that is, an 
iso-functional improvement), it's probably a good idea to add a comment for 
describing the behaviour of the controlling parameter proposed by @srowen.
   
   As a closing remark, I understand that this have caused some issues and 
frustrations to some people including yourself, but sometimes trying to make 
things better (maybe by volunteering in our spare time, like it was the case 
for me for this PR), we can cause other issues, which can in turn be tackled 
and hopefully solved, that's the beauty of OSS.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to