[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

jkbradley Sat, 01 Aug 2015 12:21:06 -0700

Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/5731#issuecomment-126947540
  
    How about this:
    * We use the ordering specified by the user, where we put features 
specified by index before features specified by name.
      * This will be a well-defined ordering.
      * It will not allow users to mix the ordering of indices and names, but I 
think that will be a rare use case.  I expect most users to use either indices 
or names, not both.
    * If a user specifies the same feature k times (by index or name), we 
create k copies of it.
    
    E.g.:
    * Input Vector col with length 10 and names "col0, col1, col2, ..."
    * User specifies selectedIndices = 0, 3, 0, and selectedLabels = col2, col0
    * Output Vector has values: ```[col0, col3, col0, col2, col0]```
    * (Note: I'm switching to 0-based indices, unlike above.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5895][ML] add vector slicer

Reply via email to