[GitHub] [spark] coleleahy commented on issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should support user-defined types as fields of Scala case class and tuple

GitBox Thu, 08 Aug 2019 22:47:08 -0700

coleleahy commented on issue #21310: [SPARK-24256][SQL] SPARK-24256: 
ExpressionEncoder should support user-defined types as fields of Scala case 
class and tuple
URL: https://github.com/apache/spark/pull/21310#issuecomment-519787012
 
 
   As @fangshil [points 
out](https://github.com/apache/spark/pull/21310#issue-187607196), due to the 
fact that Spark's encoder-generating facilities found in ScalaReflection and 
JavaTypeInference cannot be made aware of a user-defined Encoder[T], it is 
fairly inconvenient to work with a Dataset[T] for which such an encoder has 
been defined. He mentions two reasons:
   
   1. Common operations like joins and aggregations require the ability to 
encode a Dataset[(T, S)] or the like, which Spark will not recognize how to 
encode -- precisely because the encoder-generating facilities in 
ScalaReflection cannot see the custom user-defined Encoder[T].
   
   2. The perfectly reasonable desire to create a case class or Java bean 
containing a member of type T is thwarted, again because the encoder-generating 
facilities in ScalaReflection and JavaTypeInference cannot see the custom 
Encoder[T].
   
   Now, the first problem can perhaps be worked around, for example by 
implicitly defining an Encoder[(T, S)] whenever there is an implicit Encoder[T] 
and Encoder[S]. However, the second problem remains. And that is precisely what 
the present PR sets out to solve.
   
   I understand if the Spark community would prefer to take another approach to 
solving this problem, but then I'd like to find out what that approach is.
   
   For instance, is the consensus that the best approach is to create a 
UserDefinedType[T] and register it through the currently private 
UDTRegistration API? If so, could someone please point me to a thread in the 
Spark dev list that can shed light on the justification behind this choice, and 
on the timeline for making that API public?
   
   Finally, I'd like to ask why, even if the UserDefinedType[T] approach is 
preferred, the work in the present PR isn't being considered as a supplementary 
enhancement -- one which many Spark users would find very convenient.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] coleleahy commented on issue #21310: [SPARK-24256][SQL] SPARK-24256: ExpressionEncoder should support user-defined types as fields of Scala case class and tuple

Reply via email to