Eron,
Thanks for sending out this list! We can make some of the critical ones
public for 1.5, but they will be marked DeveloperApi since they may require
changes in the future. Just made the JIRA: [
https://issues.apache.org/jira/browse/SPARK-9704] and I'll send a PR soon.
Joseph
On Mon, Aug 3, 2015 at 4:51 PM, Eron Wright ewri...@live.com wrote:
Hello,
In developing new *third-party* *pipeline components* for Spark ML 1.4
(see dl4j-spark-ml), I encountered a few gaps in the earlier effort to make
the ML Developer APIs public (SPARK-5995).I plan to file issues after
we discuss on this thread. The below is a list of types that are
presently private but might best be made public.
1. *VectorUDT*.To define a relation with a vector field,
VectorUDT must be instantiated.
2. *SchemaUtils*. Third-party pipeline components have a need for
checking column types and appending columns.
3. *Identifiable trait*. The trait generates a unique identifier for
the associated pipeline component. Nice to have a consistent format by
reusing the trait.
4. *ProbabilisticClassifier*. Third-party components should leverage
the complex logic around computing only selected columns.
5. *Shared Params* (HasLabel, HasFeatures). This is covered in
SPARK-7146 but reiterating it here.
Thanks,
Eron Wright