[GitHub] [spark] srowen commented on pull request #32813: [SPARK-34591][MLLIB] Disable decision tree pruning

GitBox Tue, 08 Jun 2021 14:35:19 -0700


srowen commented on pull request #32813:
URL: https://github.com/apache/spark/pull/32813#issuecomment-857176436



   Yeah, great explanation. I'm still remembering how all this code works so 
probably asking dumb questions.
   
   Is the problem that the leaves' impurity stats are not combined, but just 
use the parent node's? or is that also not quite the point.
   
   Where do you get class probabilities out from the API, or are you reaching 
into the model to figure that out? sorry if I've just forgotten that 
possibility in the API, but didn't recall or see it. Just trying to trace back 
how probability connects to the LeafNodes -- via impurity right? your example 
doesn't seem to retrieve probabilities.
   
   The scikit tree shows a lot of "redundant" decision nodes, but if they're 
redundant, I wonder what else is stores, thus what we need to look at when 
deciding to prune or not in Spark.
   
   I think this does indeed affect a certain type of use case, hardly fully 
broken, but I do believe you that there's a problem of some size - no need to 
collect more evidence!
   
   The simplest fix I would definitely support is making this, at least, 
_optional_ rather than disable it by default. Huge forests can be an issue too 
to load, on the flipside.
   
   I'm still hoping there's a fix or misunderstanding somewhere we can save 
this with, but maybe there is nothing reasonable, and correctness is 
incompatible with what you're doing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on pull request #32813: [SPARK-34591][MLLIB] Disable decision tree pruning

Reply via email to