Github user manishamde commented on the pull request:

    https://github.com/apache/spark/pull/79#issuecomment-39401757
  
    @hirakendu Thanks a lot for the detailed comments and feedback. Yes, we 
have a responsibility to keep improving the trees going forward so getting 
additional feedback is awesome.
    
    The feedback around the current implementation in the 'Miscellaneous' 
section is around renaming and minor refactoring of code. I agree with most of 
the feedback and some choices are personal preferences which should discuss and 
resolve. We should address it soon when we implement the better ```Impurity``` 
interface that we promised to implement ASAP. 
    
    I think the "Error" interface you described is very similar to what @mengxr 
proposed as well. We should discuss the naming convention. Even though I don't 
feel strongly about "Impurity" but "Error" might not be the best name for the 
classification scenario. I am open to better names and ready to be convinced 
otherwise. :-)
    
    For 'General Design Notes', I have similar thoughts but I will wait for 
@etrain's comments since he has thought about it carefully. In general, I like 
@etrain 's MLI design for Model and Algorithm. I did not tie the current 
implementation to the existing traits yet since I wanted to have a broader 
conversation about it after the tree PR. It's straightforward to implement once 
we agree on the interfaces for mllib algorithms.
    
    Finally, and most importantly, thanks a ton for performing such extensive 
tests on a massive dataset! The results are not too shabby. ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to