fqaiser94 commented on a change in pull request #29322:
URL: https://github.com/apache/spark/pull/29322#discussion_r471198235



##########
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala
##########
@@ -453,60 +453,81 @@ class ComplexTypesSuite extends PlanTest with 
ExpressionEvalHelper {
     checkEvaluation(GetMapValue(mb0, Literal(Array[Byte](3, 4))), null)
   }
 
-  private val structAttr = 'struct1.struct('a.int)
+  private val structAttr = 'struct1.struct('a.int, 'b.int)
   private val testStructRelation = LocalRelation(structAttr)
 
-  test("simplify GetStructField on WithFields that is not changing the 
attribute being extracted") {
-    val query = testStructRelation.select(
-      GetStructField(WithFields('struct1, Seq("b"), Seq(Literal(1))), 0, 
Some("a")) as "outerAtt")
-    val expected = testStructRelation.select(GetStructField('struct1, 0, 
Some("a")) as "outerAtt")
-    checkRule(query, expected)
+  test("simplify GetStructField on UpdateFields that is not modifying the 
attribute being " +
+    "extracted") {
+    // add attribute, extract an attribute from the original struct
+    val query1 = 
testStructRelation.select(GetStructField(UpdateFields('struct1,
+      WithField("b", Literal(1)) :: Nil), 0, None) as "outerAtt")
+    // drop attribute, extract an attribute from the original struct
+    val query2 = 
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b") 
::
+      Nil), 0, None) as "outerAtt")
+    // drop attribute, add attribute, extract an attribute from the original 
struct
+    val query3 = 
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("b") 
::
+      WithField("c", Literal(2)) :: Nil), 0, None) as "outerAtt")
+    // drop attribute, add attribute, extract an attribute from the original 
struct
+    val query4 = 
testStructRelation.select(GetStructField(UpdateFields('struct1, DropField("a") 
::
+      WithField("a", Literal(1)) :: Nil), 0, None) as "outerAtt")
+    val expected = testStructRelation.select(GetStructField('struct1, 0, None) 
as "outerAtt")

Review comment:
       @cloud-fan I'm afraid I have to recommend we revert the changes in this 
PR from master. 
   There is another correctness issue I've discovered which exists even in our 
initial `withField` implementation:
   ```
   // The following query
   sql("SELECT CAST(NULL AS struct<a:int,b:int>) struct_col")
   .select($"struct_col".withField("d", lit(4)).getField("d").as("d"))
   
   // currently returns this
   +---+
   |d  |
   +---+
   |4  |
   +---+
   
   // when in fact it should return this: 
   +----+
   |d   |
   +----+
   |null|
   +----+
   ```
   I'd like to get `withField` right before coming back to `dropFields`. 
   If that sounds good to you, here is the order of PRs I propose: 
   1. Revert `dropFields` changes immediately
   2. Fix the above issue in `withField` (the real issue is in the optimizer 
rule) through another PR asap
   3. Figure out how to implement `dropFields` in another PR in the future. I 
think I have to go back to the drawing board for this one . The combination of 
requirements (return null for null input struct, allow nested API calls, allow 
users to mix `dropFields` and `withField` and `getField` in the same query and 
still generate reasonably well-optimized physical plans in a timely basis) is 
making for a slightly trickier programming exercise than I anticipated and 
needs a more robust testing strategy.  
   
   Please let me know your thoughts. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to