srowen commented on pull request #32813:
URL: https://github.com/apache/spark/pull/32813#issuecomment-856914543


   Process wise - could most of the other change here be reverted? I think a 
lot of it's formatting. The change itself is simple, it seemed. If you have a 
moment, drop in the unit test #2 as well
   
   @asolimando thank you for weighing in. In the 2 examples in the JIRA, the 
labels are not all the same.
   
   I wouldn't have though pruning would be the problem - whatever the problem 
is here - but if disabling it changes the answer, that's pretty convincing. 
Anything I can think of doesn't sound right - not getting enough min info gain? 
but the default min is 0. The randomness? but the DF is cached().
   
   I guess I'm also wondering why existing tests didn't pick up on a problem; 
entirely possible it's a test coverage thing.
   
   Maybe one step forward is to throw in some debug logging about what happens 
during pruning to verify basic things like whether you get a big tree to begin 
with (or maybe you already determined that)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to