fqaiser94 commented on a change in pull request #29322:
URL: https://github.com/apache/spark/pull/29322#discussion_r471198235
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
##########
@@ -453,60 +453,81 @@ class ComplexTypesSuite extends PlanTest with
ExpressionEvalHelper {
checkEvaluation(GetMapValue(mb0, Literal(Array[Byte](3, 4))), null)
}
- private val structAttr = 'struct1.struct('a.int)
+ private val structAttr = 'struct1.struct('a.int, 'b.int)
private val testStructRelation = LocalRelation(structAttr)
- test("simplify GetStructField on WithFields that is not changing the
attribute being extracted") {
- val query = testStructRelation.select(
- GetStructField(WithFields('struct1, Seq("b"), Seq(Literal(1))), 0,
Some("a")) as "outerAtt")
- val expected = testStructRelation.select(GetStructField('struct1, 0,
Some("a")) as "outerAtt")
- checkRule(query, expected)
+ test("simplify GetStructField on UpdateFields that is not modifying the
attribute being " +
+ "extracted") {
+ // add attribute, extract an attribute from the original struct
+ val query1 =
testStructRelation.select(GetStructField(UpdateFields('struct1,
+ WithField("b", Literal(1)) :: Nil), 0, None) as "outerAtt")
+ // drop attribute, extract an attribute from the original struct
+ val query2 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b")
::
+ Nil), 0, None) as "outerAtt")
+ // drop attribute, add attribute, extract an attribute from the original
struct
+ val query3 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b")
::
+ WithField("c", Literal(2)) :: Nil), 0, None) as "outerAtt")
+ // drop attribute, add attribute, extract an attribute from the original
struct
+ val query4 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("a")
::
+ WithField("a", Literal(1)) :: Nil), 0, None) as "outerAtt")
+ val expected = testStructRelation.select(GetStructField('struct1, 0, None)
as "outerAtt")
Review comment:
@cloud-fan I'm afraid I have to recommend we revert the changes in this
PR from master.
There is another correctness issue I've discovered which exists even in our
initial `withField` implementation:
```
sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
.select($"struct_col".withField("d", lit(4)).getField("d").as("d"))
// currently returns this
+---+
|d |
+---+
|4 |
+---+
// when in fact it should return this:
+----+
|d |
+----+
|null|
+----+
```
I'd like to get `withField` right before coming back to `dropFields`.
If that sounds good to you, here is the order of PRs I propose:
1. Revert `dropFields` changes immediately
2. Fix the above issue in `withField` (the real issue is in the optimizer
rule) through another PR asap
3. Figure out how to implement `dropFields` in another PR in the future. I
think I have to go back to the drawing board for this one . The combination of
requirements (return null for null input struct, allow nested API calls, allow
users to arbitrarily mix `dropFields` and `withField` and `getField` in the
same query and still generate reasonably well-optimized physical plans in a
timely basis) is making for a slightly trickier programming exercise than I
anticipated and needs a more robust testing strategy.
Please let me know your thoughts.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]