asolimando edited a comment on pull request #32813:
URL: https://github.com/apache/spark/pull/32813#issuecomment-856904945


   Sorry for not replying before!
   
   I think too that exposing the parameter is the safest option, since the 
pruning leads to a (sometimes sensible) performance improvement and, at least 
for prediction tasks, does not have any downside. Removing the feature entirely 
might entail a performance regression for many Spark users.
   
   Do you have an idea why the probabilities you use are impacted by this 
change?
   
   I don't have the details of the specific Decision Tree flavour implemented 
here, but if the probability is the probability to belong to a given class, 
then there must be a bug (some metadata might get messed up during the merge 
phase of the pruning process), because the assignment to a given class, is by 
construction preserved (we never merge no two nodes with distinct labels).
   
   The fact that no split happens at times does not seem an issue to me, if all 
your labels are the same, you have nothing to build at all, so the whole 
learning task seems pointless.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to