[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

imatiach-msft Fri, 06 Jan 2017 09:42:57 -0800

Github user imatiach-msft commented on the issue:

    https://github.com/apache/spark/pull/16471
  
    I think you might need to add [ML] to the pull request name, eg:
    [SPARK-19078][ML] hashingTF,ChiSqSelector,IDF,StandardScaler,PCA transform 
avoid extra vector conversion
    
    I like the changes but I'm not sure if we want to directly port the code 
instead of using the ml lib code, since now if there are bugs in ml or mllib a 
developer would have to make changes in two places instead of one.  I'm a new 
contributor myself so I'm not sure what is traditionally recommended in cases 
like this.  It looks like in some cases it should be possible to refactor 
duplicate code to one place and have both ml and mllib call into the same 
method?  Maybe that would address my concern.  Thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #16471: [SPARK-19078] hashingTF,ChiSqSelector,IDF,StandardScaler...

Reply via email to