Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/3094#issuecomment-61764340
@codedeft The long-foretold new ML API will help with these things. A WIP
PR with the Pipeline concept is out now, but a remake of the internal class
hierarchy is still being designed. The class hierarchy in particular should
make a lot of these things easier (e.g., separating labels and features).
You can see the Pipeline PR here:
[https://github.com/apache/spark/pull/3099]
The class hierarchy JIRA is here
[https://issues.apache.org/jira/browse/SPARK-3702]. The design doc is a bit
out of date, and I've been working on a branch of the Pipeline PR. It will
take a bit of time for me to merge the new Pipeline changes, but I'll ping you
when I have a branch ready. Basically, I'm trying to look ahead and think of
general use cases and algorithms (mainly for prediction) to figure out good
abstractions, while minimizing burden on developers. Feedback would be awesome
(though it might make sense to wait until I clean it up this week).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]