mazeboard edited a comment on issue #24299: [SPARK-27388][SQL] expression encoder for objects defined by properties URL: https://github.com/apache/spark/pull/24299#issuecomment-480845183 Say we have an object A having the fields B, C, and D; where A, B are avro objects, C is an avro fixed object (with property bytes) and D is an java enum. (1) {code} val implicit exprEnc = ExpressionEncoder[A]() val r: Dataset[A] = List(makeA).toDS() val ds: Dataset[(B, C, D)] = r.map(e => (e.getB, e.getC, e.getD)) {code} (2) {code} val implicit exprA = Encoders.bean[A](classOf[A]) val implicit exprA = Encoders.bean[B](classOf[B]) val implicit exprA = Encoders.bean[C](classOf[C]) val implicit exprA = Encoders.bean[D](classOf[D]) val r: Dataset[A] = List(makeA).toDS() val ds: Dataset[(B, C, D)] = r.map(e => (e.getB, e.getC, e.getD)) {code} Using the addition in this PR the code in (1) works correctly, but If we use java bean instead as in code (2), we have the following issues: - the objects of type C, avro fixed types, are not correctly encoded because the property `bytes` of fixed types is not prefixed by set/get - the java bean encoder fails to create an encoder for java enum (assertion fails, not a StructType since an enum is saved as String) - the map fails to find encoders for B, C, and D, because while creating the encoder for the tuple (e.getB, e.getC, e.getD) it will recursively search for an encoder of the tuple elements in ScalaReflection Also, we believe that the current implementation of Encoders.bean (JavaTypeInference) has a bug, indeed Line 136 in sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala should be: val properties = getJavaBeanReadableAndWritableProperties(other) and not val properties = getJavaBeanReadableProperties(other) All the tests with the java bean encoder are done with this correction All this shows that the addition in this PR must be in ScalaReflection; and since there is no encoder for java.util.List, java.util.Map and java enums, we added support for them in ScalaReflection.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
