Github user jkbradley commented on the pull request:

    https://github.com/apache/spark/pull/3094#issuecomment-61764340
  
    @codedeft The long-foretold new ML API will help with these things.  A WIP 
PR with the Pipeline concept is out now, but a remake of the internal class 
hierarchy is still being designed.  The class hierarchy in particular should 
make a lot of these things easier (e.g., separating labels and features).
    
    You can see the Pipeline PR here: 
[https://github.com/apache/spark/pull/3099]
    
    The class hierarchy JIRA is here 
[https://issues.apache.org/jira/browse/SPARK-3702].  The design doc is a bit 
out of date, and I've been working on a branch of the Pipeline PR.  It will 
take a bit of time for me to merge the new Pipeline changes, but I'll ping you 
when I have a branch ready.  Basically, I'm trying to look ahead and think of 
general use cases and algorithms (mainly for prediction) to figure out good 
abstractions, while minimizing burden on developers.  Feedback would be awesome 
(though it might make sense to wait until I clean it up this week).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to