fottey commented on issue #23908:  [SPARK-27001][SQL] Refactor "serializerFor" 
method between ScalaReflection and JavaTypeInference
URL: https://github.com/apache/spark/pull/23908#issuecomment-467955010
 
 
   As an outside observer, would this refactoring allow the method 
`ScalaReflection.serializeFor` to handle arbitrary types that conform to the 
Java bean interface, and/or common Java specific types, such as 
`java.util.List`?
   
   I recently discovered that because most of the common Scala implicit 
encoders reduce to `ExpressionEncoder`'s `apply` method, it's very difficult to 
work with arbitrary Java bean type's in the Dataset API.  
   
   Specifically, given a java bean type, `MyBean`, and an implicit encoder of 
that bean type in scope, existing Spark 2.4.0 machinery in can't synthesize a 
valid encoder at runtime for hybrid Scala / Java types, like `Seq[MyBean]` or 
tuple types like `(Int, MyBean)` despite the fact that we have encoders for 
`Seq[_]`, `Tuple2[_, _]`, and `MyBean` available separately.
   
   While it may be unreasonable to solve the problem generically across all 
potential classes, it would be really nice if `ExpressionEncoder`'s `apply` 
method could somehow detect and support at least java beans and java.util.Lists 
at runtime... 
   
   See below code examples:
   
   ```scala
   import com.example.MyBean
   import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
   
   object Example {
       case class Test()
   
       def main(args: Array[String]): Unit = {
           val spark: SparkSession = ???
   
           import spark.implicits._
   
           // Works today after above implicit import
           val ds: Dataset[Seq[Test]] = Seq(Seq(Test()), Seq(Test()), ...).toDS
   
           // DOES NOT WORK
           // ExpressionEncoder's apply method cannot handle type MyBean!
           implicit def newMyBeanExpressionEncoder: Encoder[MyBean] = 
ExpressionEncoder()
           // 
           // Need to do the following:
           implicit def newMyBeanBeanEncoder: Encoder[MyBean] = 
Encoders.bean(classOf[MyBean])
   
           // But this only allows expressing things like this:
           val ds: Dataset[MyBean] = Seq(new MyBean(), new MyBean(), ...).toDS
   
           // Due to the above limitation we CANNOT do the following, EVEN AFTER
           // newMyBeanBeanEncoder is brought into scope!
           // DOES NOT WORK 
           val ds: Dataset[Seq[MyBean]] = Seq(Seq(new MyBean()), Seq(new 
MyBean()), ...).toDS
   
           // Finally, these do not work: 
   
           // DOES NOT WORK 
           val ds: Dataset[(Int, MyBean)] = Seq((0, new MyBean()),(0, new 
MyBean()), ...).toDS
   
           // DOES NOT WORK
           implicit def newMyBeanEncoder: Encoder[Seq[MyBean]] = 
ExpressionEncoder()
           
           // DOES NOT WORK
           implicit def newMyBeanEncoder: Encoder[java.util.List[MyBean]] = 
ExpressionEncoder()
   
           // The above samples all rely on ExpressionEncoder
           // being able to handle every type in the expression...
           // currently seems to work for:
           // - case classes
           // - tuples
           // - scala.Product
           // - scala "primitives"
           // other common types with encoders... BUT NOT java beans... :'(
       }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to