mazeboard commented on issue #24299: [SPARK-27388][SQL] expression encoder for 
objects defined by properties
URL: https://github.com/apache/spark/pull/24299#issuecomment-481808229
 
 
   This PR is not an extension of the existing Java Bean Encoder: The PR adds 
support for bean objects, java.util.List, java.util.Map, and java enums to 
ScalaReflection;  unlike the existing javaBean Encoder, properties can be named 
without the set/get prefix.
   
   Reminder of Avro types:
      primitive types: null, boolean, int, long, float, double, bytes, string
      complex types: Records, Enums, Arrays, Maps, Unions, Fixed
   
   All Avro types are supported by this PR (may be a better test including all 
Avro types is required), including simple union types and excluding complex 
union types: Simple  Avro unions are [null, type1], all other unions are 
complex.
   
   this PR supports simple unions but not complex unions for the simple reason 
that the Avro compiler will generate java code with type Object for all complex 
unions, and fields with simple unions will be typed as the non-null type of the 
union. 
   
   Currently, the ScalaReflection does not have an encoder for Object type, but 
we can modify the ScalaReflection to use a default encoder (ie. kryo) for 
Object type (currently I do not know if this advisable, but why not?).
   
   I do not understand the issue related to being Reflection-driven approach: 
all common scala objects are encoded using reflection.
   
   I may be wrong, but as I tried to explain in this PR, that we need to add 
types in ScalaReflection to be able to transform Datasets to other Datasets of 
embedded Avro types. 
   
   As an example the map function transforms the Dataset[A] to a Dataset[(B, C)]
   
   {code} 
   val r: Dataset[A] = List(makeA).toDS() 
   val ds: Dataset[(B, C)] = r.map(e => (e.getB, e.getC))
   {/code}
   
   The map function will recursively use ScalaReflection to find encoders for 
B, C types (Do you know if this compiles with #22878 solution? Or does it 
complain with no encoder found?)
   
   Finally, I did not understand the benefits of AvroSchema driven approach, 
for me an Avro object is completely defined by its properties (that are derived 
from the Avro schema); the Avro compiler generates java code with all the 
properties in the Avro schema.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to