asolimando commented on pull request #32813: URL: https://github.com/apache/spark/pull/32813#issuecomment-856904945
> I'm a little reluctant to change behavior but that could be the right thing. What about just exposing this param at least? > > Im still kind of curious how this happens. Is the pruning logic just not correct? Or do you have sense of what the tree is like before and after pruning? Sorry for not replying before! I think too that exposing the parameter is the safest option, since the pruning leads to a (sometimes sensible) performance improvement and, at least for prediction tasks, does not have any downside. Removing the feature entirely might entail a performance regression for many Spark users. Do you have an idea why the probabilities you use are impacted by this change? I don't have the details of the specific Decision Tree flavour implemented here, but if the probability is the probability to belong to a given class, then there must be a bug (some metadata might get messed up during the merge phase of the pruning process), because the assignment to a given class, is by construction preserved (we never merge no two nodes with distinct labels). The fact that no split happens at times does not seem an issue to me, if all your labels are the same, you have nothing to build at all, so the whole learning task seems pointless. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
