asolimando edited a comment on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-856904945
Sorry for not replying before! I think too that exposing the parameter is the safest option, since the pruning leads to a (sometimes sensible) performance improvement and, at least for prediction tasks, does not have any downside. Removing the feature entirely might entail a performance regression for many Spark users. Do you have an idea why the probabilities you use are impacted by this change? I don't have the details of the specific Decision Tree flavour implemented here, but if the probability is the probability to belong to a given class, then there must be a bug (some metadata might get messed up during the merge phase of the pruning process), because the assignment to a given class, is by construction preserved (we never merge no two nodes with distinct labels). The fact that no split happens at times does not seem an issue to me, if all your labels are the same, you have nothing to build at all, so the whole learning task seems pointless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
