[GitHub] spark pull request #22708: [SPARK-21402] Fix java array/map of structs deser...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225440522 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -30,6 +30,7 @@ import org.apache.spark.serializer._ import org.apache.spark.sql.Row import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, ScalaReflection} import org.apache.spark.sql.catalyst.ScalaReflection.universe.TermName +import org.apache.spark.sql.catalyst.analysis.UnresolvedException --- End diff -- nit: it does not seem to be necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402] Fix java array/map of structs deser...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225391643 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -282,6 +283,27 @@ case class StaticInvoke( } } +/** + * When constructing [[Invoke]], the data type must be given, which may be not possible to define + * before analysis. This class acts like a placeholder for [[Invoke]], and will be replaced by + * [[Invoke]] during analysis after the input data is resolved. Data type passed to [[Invoke]] + * will be defined by applying [[dataTypeFunction]] to the data type of the input data. + */ +case class UnresolvedInvoke( --- End diff -- Should we move this to `unresolved.scala`? cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402] Fix java array/map of structs deser...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225182746 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -271,32 +272,41 @@ object JavaTypeInference { case c if listType.isAssignableFrom(typeToken) => val et = elementType(typeToken) -MapObjects( +UnresolvedMapObjects( --- End diff -- yes please, thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402] Fix java array/map of structs deser...
Github user vofque commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225121081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -271,32 +272,41 @@ object JavaTypeInference { case c if listType.isAssignableFrom(typeToken) => val et = elementType(typeToken) -MapObjects( +UnresolvedMapObjects( --- End diff -- Sure. Should I create another pull request with this change only? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [SPARK-21402] Fix java array/map of structs deser...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22708#discussion_r225118613 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -271,32 +272,41 @@ object JavaTypeInference { case c if listType.isAssignableFrom(typeToken) => val et = elementType(typeToken) -MapObjects( +UnresolvedMapObjects( --- End diff -- can we exclude other changes except this one? This one is very easy to reason about. We did the same thing in `ScalaReflection.` We need more time to think about the map case, and fix it in `ScalaReflection` as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22708: [Spark 21402] Fix java array/map of structs deser...
GitHub user vofque opened a pull request: https://github.com/apache/spark/pull/22708 [Spark 21402] Fix java array/map of structs deserialization When deserializing values of ArrayType with struct elements or MapType with struct keys/values in java beans, fields of structs get mixed up. I suggest using struct data types retrieved from resolved input data instead of inferring them from java beans. ## What changes were proposed in this pull request? _In case of array:_ MapObjects expression is used to map array elements to java beans. Struct type of elements is inferred from java bean structure and ends up with mixed up field order. I used UnresolvedMapObjects instead of MapObjects, which allows to provide element type for MapObjects during analysis based on the resolved input data, not on the java bean. _In case of map:_ Invocations of "keyArray" and "valueArray" functions are used to extract arrays of keys and values. Struct type of keys or values is also inferred from java bean structure and ends up with mixed up field order. I created a new UnresolvedInvoke expression as a temporary substitution of Invoke expression while no actual data is available. It allows to provide the resulting data type during analysis based on the resolved input data, not on the java bean (similar to UnresolvedMapObjects). Key and value arrays are then fed to MapObjects expression which I replaced with UnresolvedMapObjects, just like in case of ArrayType. Finally I added resolution of UnresolvedInvoke expressions in Analyzer.resolveExpression method as an additional pattern matching case. ## How was this patch tested? Created and ran unit tests for two described cases. Built complete project on travis. @michalsenkyr @cloud-fan @marmbrus @liancheng You can merge this pull request into a Git repository by running: $ git pull https://github.com/vofque/spark SPARK-21402 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22708.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22708 commit d3578bb3776a79b97f2713f752127849e7910368 Author: Vladimir Kuriatkov Date: 2018-10-12T10:49:18Z Merge pull request #2 from apache/master Synch with apache:master commit 7576ba7c9b830e45b454a21978fbb0a4275064a6 Author: Vladimir Kuriatkov Date: 2018-10-11T12:56:15Z Java array/map of structs deserialization fixed commit c9ffcd4de5738dfabb124aae28ef7b79b36e48d6 Author: Vladimir Kuriatkov Date: 2018-10-12T11:10:38Z Merge pull request #3 from vofque/bugfix Java array/map of structs deserialization fixed --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org