mazeboard edited a comment on issue #24299: [SPARK-27388][SQL] expression encoder for objects defined by properties URL: https://github.com/apache/spark/pull/24299#issuecomment-481808229 This PR is not an extension of the existing Java Bean Encoder: The PR adds support for bean objects, java.util.List, java.util.Map, and java enums to ScalaReflection; unlike the existing javaBean Encoder, properties can be named without the set/get prefix (this is one of the key points that allows the encoding of Avro Fixed types. I believe, the other key point is that the addition must be in ScalaReflection). Reminder of Avro types: primitive types: null, boolean, int, long, float, double, bytes, string complex types: Records, Enums, Arrays, Maps, Unions, Fixed All Avro types are supported by this PR (may be a better test including all Avro types is required), including simple union types and excluding complex union types: Simple Avro unions are [null, type1], all other unions are complex. this PR supports simple unions but not complex unions for the simple reason that the Avro compiler will generate java code with type Object for all complex unions, and fields with simple unions will be typed as the non-null type of the union. Currently, the ScalaReflection does not have an encoder for Object type, but we can modify the ScalaReflection to use a default encoder (ie. kryo) for Object type (currently I do not know if this is advisable, but why not? instead of throwing an error we could use a default encoder for the objects that have no encoder found). I do not understand the issue related to being Reflection-driven approach: all common scala objects are encoded using reflection. I may be wrong, but as I tried to explain in this PR, that we need to add types in ScalaReflection to be able to transform Datasets to other Datasets of embedded Avro types. As an example the following map function transforms the Dataset[A] to a Dataset[(B, C)] {code} val r: Dataset[A] = List(makeA).toDS() val ds: Dataset[(B, C)] = r.map(e => (e.getB, e.getC)) {/code} The map function will recursively use ScalaReflection to find encoders for B, C types (Do you know if this runs with #22878 solution? Or does it complain at runtime with no encoder found?) Finally, I did not understand the benefits of AvroSchema driven approach, for me an Avro object is completely defined by its properties (that are derived from the Avro schema); the Avro compiler generates java code with all the properties in the Avro schema.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
