Implementing custom encoders is unfortunately not well supported at the moment (IIRC there are plans to eventually add an api for user defined encoders).
That being said, there are a couple of encoders that can work with generic, serializable data types: "javaSerialization" and "kryo", found here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$. These encoders need to be specified explicitly, as in "spark.createDataset(...)(Encoders.javaSerialization)" In Spark 2.1 there will also be special trait "org.apache.spark.sql.catalyst.DefinedByConstructorParams" that can be mixed into arbitrary classes and that has implicit encoders available. If you don't control the source of the class in question and it is not serializable, it may still be possible to define your own Encoder by implementing your own "o.a.s.sql.catalyst.encoders.ExpressionEncoder". However, that requires quite some knowledge on how Spark's SQL optimizer (catalyst) works internally and I don't think there is much documentation on that. regards, --Jakob On Mon, Aug 29, 2016 at 10:39 PM, canan chen <ccn...@gmail.com> wrote: > > e.g. I have a custom class A (not case class), and I'd like to use it as > DataSet[A]. I guess I need to implement Encoder for this, but didn't find > any example for that, is there any document for that ? Thanks > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org