fqaiser94 commented on a change in pull request #29795:
URL: https://github.com/apache/spark/pull/29795#discussion_r495623143
##########
File path: sql/core/benchmarks/UpdateFieldsBenchmark-results.txt
##########
@@ -0,0 +1,26 @@
+================================================================================================
+Add 2 columns and drop 2 columns at 3 different depths of nesting
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_212-b03 on Mac OS X 10.14.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Add 2 columns and drop 2 columns at 3 different depths of nesting: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+To non-nullable StructTypes using performant method
10 11 2 0.0 Infinity 1.0X
+To nullable StructTypes using performant method
9 10 1 0.0 Infinity 1.0X
+To non-nullable StructTypes using non-performant method
2457 2464 10 0.0 Infinity 0.0X
+To nullable StructTypes using non-performant method
42641 43804 1644 0.0 Infinity 0.0X
+
+
+================================================================================================
+Add 50 columns and drop 50 columns at 100 different depths of nesting
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_212-b03 on Mac OS X 10.14.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Add 50 columns and drop 50 columns at 100 different depths of nesting: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------------------------
+To non-nullable StructTypes using performant method
4595 4927 470 0.0 Infinity 1.0X
+To nullable StructTypes using performant method
5185 5516 468 0.0 Infinity 0.9X
+
+
Review comment:
Changed the benchmark up a little bit so that we can compare the
performant and non-performant methods of updating multiple nested columns.
##########
File path: sql/core/benchmarks/UpdateFieldsBenchmark-results.txt
##########
@@ -0,0 +1,26 @@
+================================================================================================
+Add 2 columns and drop 2 columns at 3 different depths of nesting
+================================================================================================
+
+OpenJDK 64-Bit Server VM 1.8.0_212-b03 on Mac OS X 10.14.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
+Add 2 columns and drop 2 columns at 3 different depths of nesting: Best
Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------------------------------
+To non-nullable StructTypes using performant method
10 11 2 0.0 Infinity 1.0X
+To nullable StructTypes using performant method
9 10 1 0.0 Infinity 1.0X
+To non-nullable StructTypes using non-performant method
2457 2464 10 0.0 Infinity 0.0X
+To nullable StructTypes using non-performant method
42641 43804 1644 0.0 Infinity 0.0X
Review comment:
This last result is pretty bad (43 seconds).
It's partially because of the non-performant method and partially because
the optimizer rules aren't perfect in complex nullable StructType scenarios
(I've documented these scenarios in this
[commit](https://github.com/apache/spark/pull/29795/commits/4fe48b4287c81e73276165453477811211e341d9)).
It should be possible to improve the optimizer rules further. I have a
couple of simple ideas I'm toying around with but it will take me a while to
reason/test if they are safe from a correctness point of view.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]