fottey commented on issue #23908: [SPARK-27001][SQL] Refactor "serializerFor" method between ScalaReflection and JavaTypeInference URL: https://github.com/apache/spark/pull/23908#issuecomment-467955010 As an outside observer, would this refactoring allow the method `ScalaReflection.serializeFor` to handle arbitrary types that conform to the Java bean interface, and/or common Java specific types, such as `java.util.List`? I recently discovered that because most of the common Scala implicit encoders reduce to `ExpressionEncoder`'s `apply` method, it's very difficult to work with arbitrary Java bean type's in the Dataset API. Specifically, given a java bean type, `MyBean`, and an implicit encoder of that bean type in scope, existing Spark 2.4.0 machinery in can't synthesize a valid encoder at runtime for hybrid Scala / Java types, like `Seq[MyBean]` or tuple types like `(Int, MyBean)` despite the fact that we have encoders for `Seq[_]`, `Tuple2[_, _]`, and `MyBean` available separately. While it may be unreasonable to solve the problem generically across all potential classes, it would be really nice if `ExpressionEncoder`'s `apply` method could somehow detect and support at least java beans and java.util.Lists at runtime... See below code examples: ```scala import com.example.MyBean import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder object Example { case class Test() def main(args: Array[String]): Unit = { val spark: SparkSession = ??? import spark.implicits._ // Works today after above implicit import val ds: Dataset[Seq[Test]] = Seq(Seq(Test()), Seq(Test()), ...).toDS // DOES NOT WORK // ExpressionEncoder's apply method cannot handle type MyBean! implicit def newMyBeanExpressionEncoder: Encoder[MyBean] = ExpressionEncoder() // // Need to do the following: implicit def newMyBeanBeanEncoder: Encoder[MyBean] = Encoders.bean(classOf[MyBean]) // But this only allows expressing things like this: val ds: Dataset[MyBean] = Seq(new MyBean(), new MyBean(), ...).toDS // Due to the above limitation we CANNOT do the following, EVEN AFTER // newMyBeanBeanEncoder is brought into scope! // DOES NOT WORK val ds: Dataset[Seq[MyBean]] = Seq(Seq(new MyBean()), Seq(new MyBean()), ...).toDS // Finally, these do not work: // DOES NOT WORK val ds: Dataset[(Int, MyBean)] = Seq((0, new MyBean()),(0, new MyBean()), ...).toDS // DOES NOT WORK implicit def newMyBeanEncoder: Encoder[Seq[MyBean]] = ExpressionEncoder() // DOES NOT WORK implicit def newMyBeanEncoder: Encoder[java.util.List[MyBean]] = ExpressionEncoder() // The above samples all rely on ExpressionEncoder // being able to handle every type in the expression... // currently seems to work for: // - case classes // - tuples // - scala.Product // - scala "primitives" // other common types with encoders... BUT NOT java beans... :'( } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
