fqaiser94 commented on a change in pull request #29322:
URL: https://github.com/apache/spark/pull/29322#discussion_r471023135
##########
File path:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
##########
@@ -453,60 +453,81 @@ class ComplexTypesSuite extends PlanTest with
ExpressionEvalHelper {
checkEvaluation(GetMapValue(mb0, Literal(Array[Byte](3, 4))), null)
}
- private val structAttr = 'struct1.struct('a.int)
+ private val structAttr = 'struct1.struct('a.int, 'b.int)
private val testStructRelation = LocalRelation(structAttr)
- test("simplify GetStructField on WithFields that is not changing the
attribute being extracted") {
- val query = testStructRelation.select(
- GetStructField(WithFields('struct1, Seq("b"), Seq(Literal(1))), 0,
Some("a")) as "outerAtt")
- val expected = testStructRelation.select(GetStructField('struct1, 0,
Some("a")) as "outerAtt")
- checkRule(query, expected)
+ test("simplify GetStructField on UpdateFields that is not modifying the
attribute being " +
+ "extracted") {
+ // add attribute, extract an attribute from the original struct
+ val query1 =
testStructRelation.select(GetStructField(UpdateFields('struct1,
+ WithField("b", Literal(1)) :: Nil), 0, None) as "outerAtt")
+ // drop attribute, extract an attribute from the original struct
+ val query2 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b")
::
+ Nil), 0, None) as "outerAtt")
+ // drop attribute, add attribute, extract an attribute from the original
struct
+ val query3 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b")
::
+ WithField("c", Literal(2)) :: Nil), 0, None) as "outerAtt")
+ // drop attribute, add attribute, extract an attribute from the original
struct
+ val query4 =
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("a")
::
+ WithField("a", Literal(1)) :: Nil), 0, None) as "outerAtt")
+ val expected = testStructRelation.select(GetStructField('struct1, 0, None)
as "outerAtt")
Review comment:
@cloud-fan sorry about this but I've introduced a pretty bad bug in this
PR.
This test is wrong; `query4` should actually NOT equal to `expected`.
The optimizer rule needs to be improved.
As a result of this bug, users will get incorrect results in scenarios like
the following:
```
sql("SELECT named_struct('a', 1, 'b', 2) struct_col")
.select($"struct_col".dropFields("a").getField("b").as("b"))
.show(false)
// currently returns this:
+---+
|b |
+---+
|1 |
+---+
// when in fact it should return this:
+---+
|b |
+---+
|2 |
+---+
```
I'm working on the code to fix this. Just trying to make sure I have all the
edge cases covered before I submit a PR, hopefully by the end of the weekend.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]