Github user bdrillard commented on a diff in the pull request:
https://github.com/apache/spark/pull/20085#discussion_r159519672
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
---
@@ -1237,47 +1342,91 @@ case class DecodeUsingSerializer[T](child:
Expression, tag: ClassTag[T], kryo: B
}
--- End diff --
In order to support initializations on more complicated objects, it makes
sense to generalize `InitializeJavaBean` to an `InitializeObject` that can take
a sequence of method names associated with a sequence of those methods'
arguments. It seems thought that on plan analysis, Spark fails to resolve the
column names against the Expression `children` when those child expressions are
gathered from a `Seq[Expression]`, yielding errors like:
```
Resolved attribute(s) 'field1,'field2 missing from field1#2,field2#3 in
operator 'DeserializeToObject initializeobject(newInstance(class
org.apache.spark.sql.catalyst.expressions.GenericBean),
(setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))),
obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with
the same name appear in the operation: field1,field2. Please check if the right
attribute(s) are used.;
org.apache.spark.sql.AnalysisException: Resolved attribute(s)
'field1,'field2 missing from field1#2,field2#3 in operator 'DeserializeToObject
initializeobject(newInstance(class
org.apache.spark.sql.catalyst.expressions.GenericBean),
(setField1,List(assertnotnull('field1))), (setField2,List('field2.toString))),
obj#4: org.apache.spark.sql.catalyst.expressions.GenericBean. Attribute(s) with
the same name appear in the operation: field1,field2. Please check if the right
attribute(s) are used.;
at
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
```
Interestingly, if we change the `setters` signature from `Seq[(String,
Seq[Expression])]` to `Seq[(String, (Expression, Expression)]`, (the use case
for Spark-Avro, where objects are initialized by calling `put` with an integer
index argument and then some object argument), the plan will resolve. But of
course, such a function signature would in a sense be hard-coded for Avro.
Any ideas why passing a sequence of child expressions would yield the
analysis error above?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]