[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField

GitBox Mon, 10 Jun 2019 23:04:13 -0700

dongjoon-hyun commented on a change in pull request #24599: [SPARK-27701][SQL] 
Extend NestedColumnAliasing to general nested field cases including 
GetArrayStructField
URL: https://github.com/apache/spark/pull/24599#discussion_r292288893


 ##########
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/NestedSchemaPruningBenchmark.scala
 ##########
 @@ -33,13 +33,17 @@ abstract class NestedSchemaPruningBenchmark extends 
SqlBasedBenchmark {
   protected val N = 1000000
   protected val numIters = 10
 
-  // We use `col1 BIGINT, col2 STRUCT<_1: BIGINT, _2: STRING>` as a test 
schema.
-  // col1 and col2._1 is used for comparision. col2._2 mimics the burden for 
the other columns
+  // We use `col1 BIGINT, col2 STRUCT<_1: BIGINT, _2: STRING>,
+  // col3 ARRAY<STRUCT<_1: BIGINT, _2: STRING>>` as a test schema.
+  // col1, col2._1 and col3._1 are used for comparision. col2._2 and col3._2 
mimics the burden
+  // for the other columns
   private val df = spark
     .range(N * 10)
     .sample(false, 0.1)
-    .map(x => (x, (x, s"$x" * 100)))
-    .toDF("col1", "col2")
+    .map { x =>
+      val col3 = (0 until 10).map(i => (x + i, s"$x" * 10))
+      (x, (x, s"$x" * 100), col3)
+    }.toDF("col1", "col2", "col3")
 
 Review comment:
   Ur, this seems to have a side-effect on the `Nested column`, too. I ran the 
test at AS-IS PR. The following is the result.
   
   **PREVIOUS**
   ```
   OpenJDK 64-Bit Server VM 1.8.0_201-b09 on Linux 3.10.0-862.3.2.el7.x86_64
   Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
   Selection:                                Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Top-level column                                    131            150       
   25          7.7         130.6       1.0X
   Nested column                                       922            954       
   21          1.1         922.2       0.1X
   ```
   
   **THIS PR**
   ```
   [info] OpenJDK 64-Bit Server VM 1.8.0_212-b04 on Linux 
3.10.0-862.3.2.el7.x86_64
   [info] Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
   [info] Selection:                                Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   [info] 
------------------------------------------------------------------------------------------------------------------------
   [info] Top-level column                                    137            
160          23          7.3         136.5       1.0X
   [info] Nested column                                      1211           
1243          27          0.8        1210.5       0.1X
   [info] Nested column in array                            11210          
11292          39          0.1       11209.6       0.0X
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField

Reply via email to