Re: Need some help to create user defined type for ML pipeline

2015-02-23 Thread Xiangrui Meng
Yes, we are going to expose the developer API. There was a long discussion in the PR: https://github.com/apache/spark/pull/3637. So we marked them package private and look for feedback on how to improve it. Please implement your classes under `spark.ml` for now and let us know your feedback. Thanks

Re: Need some help to create user defined type for ML pipeline

2015-02-23 Thread Jaonary Rabarisoa
Hi Joseph, Thank you for you feedback. I've managed to define an image type by following VectorUDT implementation. I have another question about the definition of a user defined transformer. The unary tranfromer is private to spark ml. Do you plan to give a developer api for transformers ? On

Re: Need some help to create user defined type for ML pipeline

2015-01-24 Thread Joseph Bradley
Hi Jao, You're right that defining serialize and deserialize is the main task in implementing a UDT. They are basically translating between your native representation (ByteImage) and SQL DataTypes. The sqlType you defined looks correct, and you're correct to use a row of length 4. Other than th

Need some help to create user defined type for ML pipeline

2015-01-19 Thread Jaonary Rabarisoa
Hi all, I'm trying to implement a pipeline for computer vision based on the latest ML package in spark. The first step of my pipeline is to decode image (jpeg for instance) stored in a parquet file. For this, I begin to create a UserDefinedType that represents a decoded image stored in a array of