Github user vofque commented on the issue:
https://github.com/apache/spark/pull/22708
The original problem is described here:
https://issues.apache.org/jira/browse/SPARK-21402
I'll try to explain what happens in detail.
Let's consider this data structure:
root
|-- intervals: array
| |-- element: struct
| | |-- startTime: long
| | |-- endTime: long
And let's say we have a java bean class with corresponding structure.
When building a deserializer for the field _intervals_ in
_JavaTypeInference.deserializerFor_ we construct a _MapObjects_ expression to
convert structs to java beans:
```
case c if listType.isAssignableFrom(typeToken) =>
val et = elementType(typeToken)
MapObjects(
p => deserializerFor(et, Some(p)),
getPath,
inferDataType(et)._1,
customCollectionCls = Some(c))
```
_MapObjects_ requires _DataType_ of array elements. It is extracted from
java element type using _JavaTypeInference.inferDataType_ which gets java bean
properties and maps them to _StructFields_.
```
case other =>
// some more code goes here
val properties = getJavaBeanReadableProperties(other)
val fields = properties.map { property =>
val returnType = typeToken.method(property.getReadMethod).getReturnType
val (dataType, nullable) = inferDataType(returnType, seenTypeSet +
other)
new StructField(property.getName, dataType, nullable)
}
```
The order of properties in the resulting _StructType_ may not correspond to
their declaration order as the declaration order is simply unknown. So the
resulting element _StructType_ may look like this:
root
|-- endTime: long
|-- startTime: long
This _StructType_ is passed to _MapObjects_ and then to its loop variable
_LambdaVariable_.
For deserialization of single array elements an _InitializeJavaBean_
expression is created. It contains _UnresolvedExtractValue_ expressions for
each field, and these expressions have _LambdaVariable_ as a child. They are
resolved during analysis:
```
case UnresolvedExtractValue(child, fieldName) if child.resolved =>
ExtractValue(child, fieldName, resolver)
```
For each field _startTime_ and _endTime_ ordinals are calculated. For that
child's _DataType_ is used, and in our case this is _StructType_ of
_LambdaVariable_ with incorrect field order.
As a result we get _GetStructField_ expressions with ordinal = 0 for
'endTime' and ordinal = 1 for startTime.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]