Yes, we are going to expose the developer API. There was a long
discussion in the PR: https://github.com/apache/spark/pull/3637. So we
marked them package private and look for feedback on how to improve
it. Please implement your classes under `spark.ml` for now and let us
know your feedback. Thanks
Hi Joseph,
Thank you for you feedback. I've managed to define an image type by
following VectorUDT implementation.
I have another question about the definition of a user defined transformer.
The unary tranfromer is private to spark ml. Do you plan
to give a developer api for transformers ?
On
Hi Jao,
You're right that defining serialize and deserialize is the main task in
implementing a UDT. They are basically translating between your native
representation (ByteImage) and SQL DataTypes. The sqlType you defined
looks correct, and you're correct to use a row of length 4. Other than
th
Hi all,
I'm trying to implement a pipeline for computer vision based on the latest
ML package in spark. The first step of my pipeline is to decode image (jpeg
for instance) stored in a parquet file.
For this, I begin to create a UserDefinedType that represents a decoded
image stored in a array of