eejbyfeldt opened a new pull request #32783:
URL: https://github.com/apache/spark/pull/32783


   
   ### What changes were proposed in this pull request?
   Use the key/valueLambdaFunction to convert the elements instead of
   using CatalystTypeConverters.createToScalaConverter. This is how it is
   done in MapObjects and that correctly handles Arrays with case classes.
   
   ### Why are the changes needed?
   Before these changes the added test cases would fail with the following:
   ```
   [info] - encode/decode for map with case class as value: Map(1 -> 
IntAndString(1,a)) (interpreted path) *** FAILED *** (64 milliseconds)
   [info]   Encoded/Decoded data does not match input data
   [info]   
   [info]   in:  Map(1 -> IntAndString(1,a))
   [info]   out: Map(1 -> [1,a])
   [info]   types: scala.collection.immutable.Map$Map1 [info]   
   [info]   Encoded Data: 
[org.apache.spark.sql.catalyst.expressions.UnsafeMapData@5ecf5d9e]
   [info]   Schema: value#823
   [info]   root
   [info]   -- value: map (nullable = true)
   [info]       |-- key: integer
   [info]       |-- value: struct (valueContainsNull = true)
   [info]       |    |-- i: integer (nullable = false)
   [info]       |    |-- s: string (nullable = true)
   [info]   
   [info]   
   [info]   fromRow Expressions:
   [info]   catalysttoexternalmap(lambdavariable(CatalystToExternalMap_key, 
IntegerType, false, 178), lambdavariable(CatalystToExternalMap_key, 
IntegerType, false, 178), lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179), 
if (isnull(lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))) 
null else newInstance(class 
org.apache.spark.sql.catalyst.encoders.IntAndString), input[0, 
map<int,struct<i:int,s:string>>, true], interface scala.collection.immutable.Map
   [info]   :- lambdavariable(CatalystToExternalMap_key, IntegerType, false, 
178)
   [info]   :- lambdavariable(CatalystToExternalMap_key, IntegerType, false, 
178)
   [info]   :- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
   [info]   :- if (isnull(lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))) 
null else newInstance(class org.apache.spark.sql.catalyst.encoders.IntAndString)
   [info]   :  :- isnull(lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179))
   [info]   :  :  +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
   [info]   :  :- null
   [info]   :  +- newInstance(class 
org.apache.spark.sql.catalyst.encoders.IntAndString)
   [info]   :     :- assertnotnull(lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).i)
   [info]   :     :  +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).i
   [info]   :     :     +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
   [info]   :     +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 
179).s.toString
   [info]   :        +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179).s
   [info]   :           +- lambdavariable(CatalystToExternalMap_value, 
StructField(i,IntegerType,false), StructField(s,StringType,true), true, 179)
   [info]   +- input[0, map<int,struct<i:int,s:string>>, true] 
(ExpressionEncoderSuite.scala:627)
   ```
   So using a map with cases classes for keys or values and using the 
interpreted path would incorrect deserialize data from the catalyst 
representation.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, it fixes the bug.
   
   
   ### How was this patch tested?
   Existing and new unit tests in the ExpressionEncoderSuite


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to