[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22501


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226769745
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

For this part, right, @rdblue . I guess so.
After merging EC2 result to @wangyum 's PR, I'll compare the numbers one by 
one once again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-19 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226765772
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

@dongjoon-hyun, so you are saying that it doesn't appear that there is a 
performance regression, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226742168
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

The following [EC2 result](https://github.com/wangyum/spark/pull/19) shows 
the consistent ratio like Spark 2.1.0. The result on Mac seemed to be unstable 
for some unknown reason like 
https://github.com/apache/spark/pull/22501#discussion_r226440992. 
```scala
1 cols x 10 rows (read parquet) 61 /   70  1.6  
   610.2   0.6X
1 cols x 10 rows (write parquet)   209 /  233  0.5  
  2086.1   0.2X
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226740901
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
+100 cols x 1000 rows (read in-mem)  26 /   28  3.9 
255.8   0.9X
+100 cols x 1000 rows (exec in-mem)  32 /   35  3.1 
319.3   0.7X
+100 cols x 1000 rows (read parquet) 45 /   52  2.2 
445.9   0.5X
+100 cols x 1000 rows (write parquet)   275 /  536  0.4 
   2746.1   0.1X
+2500 cols x 40 rows (read in-mem)  261 /  434  0.4 
   2607.3   0.1X
+2500 cols x 40 rows (exec in-mem)  624 /  701  0.2 
   6240.5   0.0X
+2500 cols x 40 rows (read parquet) 196 /  301  0.5 
   1963.4   0.1X
+2500 cols x 40 rows (write parquet)687 / 1049  0.1 
   6870.6   0.0X
--- End diff --

FYI, this large gap was removed at EC2 result.


---

-
To unsubscribe, e-mail: 

[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226520120
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

May be a parquet issue. I found that the binary write performance is a 
little worse after upgrading to parquet 1.10.0: 
https://github.com/apache/parquet-mr/pull/505. I will verify it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226516354
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

I have no idea how this happens. Can you create a JIRA ticket to 
investigate this regression?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226442573
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
+100 cols x 1000 rows (read in-mem)  26 /   28  3.9 
255.8   0.9X
+100 cols x 1000 rows (exec in-mem)  32 /   35  3.1 
319.3   0.7X
+100 cols x 1000 rows (read parquet) 45 /   52  2.2 
445.9   0.5X
+100 cols x 1000 rows (write parquet)   275 /  536  0.4 
   2746.1   0.1X
+2500 cols x 40 rows (read in-mem)  261 /  434  0.4 
   2607.3   0.1X
+2500 cols x 40 rows (exec in-mem)  624 /  701  0.2 
   6240.5   0.0X
+2500 cols x 40 rows (read parquet) 196 /  301  0.5 
   1963.4   0.1X
+2500 cols x 40 rows (write parquet)687 / 1049  0.1 
   6870.6   0.0X
+
+

+
+wide shallowly nested struct field read and write


[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226440992
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
+100 cols x 1000 rows (read in-mem)  26 /   28  3.9 
255.8   0.9X
+100 cols x 1000 rows (exec in-mem)  32 /   35  3.1 
319.3   0.7X
+100 cols x 1000 rows (read parquet) 45 /   52  2.2 
445.9   0.5X
+100 cols x 1000 rows (write parquet)   275 /  536  0.4 
   2746.1   0.1X
+2500 cols x 40 rows (read in-mem)  261 /  434  0.4 
   2607.3   0.1X
+2500 cols x 40 rows (exec in-mem)  624 /  701  0.2 
   6240.5   0.0X
+2500 cols x 40 rows (read parquet) 196 /  301  0.5 
   1963.4   0.1X
+2500 cols x 40 rows (write parquet)687 / 1049  0.1 
   6870.6   0.0X
--- End diff --

The difference between `best` and `average` is too high in line 32 and line 
33.
I'll try to run this on EC2, too.


---


[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r226439834
  
--- Diff: sql/core/benchmarks/WideSchemaBenchmark-results.txt ---
@@ -1,117 +1,145 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz

+
+parsing large select expressions

+
 
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 parsing large select:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 select expressions 2 /4  0.0 
2050147.0   1.0X
-100 select expressions   6 /7  0.0 
6123412.0   0.3X
-2500 select expressions135 /  141  0.0   
134623148.0   0.0X
+1 select expressions 2 /4  0.0 
1934953.0   1.0X
+100 select expressions   4 /5  0.0 
3659399.0   0.5X
+2500 select expressions 68 /   76  0.0
68278937.0   0.0X
 
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
 

+
+many column field read and write

+
+
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
+Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 many column field r/w:   Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
 

-1 cols x 10 rows (read in-mem)  16 /   18  6.3 
158.6   1.0X
-1 cols x 10 rows (exec in-mem)  17 /   19  6.0 
166.7   1.0X
-1 cols x 10 rows (read parquet) 24 /   26  4.3 
235.1   0.7X
-1 cols x 10 rows (write parquet)81 /   85  1.2 
811.3   0.2X
-100 cols x 1000 rows (read in-mem)  17 /   19  6.0 
166.2   1.0X
-100 cols x 1000 rows (exec in-mem)  25 /   27  4.0 
249.2   0.6X
-100 cols x 1000 rows (read parquet) 23 /   25  4.4 
226.0   0.7X
-100 cols x 1000 rows (write parquet)83 /   87  1.2 
831.0   0.2X
-2500 cols x 40 rows (read in-mem)  132 /  137  0.8 
   1322.9   0.1X
-2500 cols x 40 rows (exec in-mem)  326 /  330  0.3 
   3260.6   0.0X
-2500 cols x 40 rows (read parquet) 831 /  839  0.1 
   8305.8   0.0X
-2500 cols x 40 rows (write parquet)237 /  245  0.4 
   2372.6   0.1X
-
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_92-b14 on Mac OS X 10.11.6
-Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz
+1 cols x 10 rows (read in-mem)  22 /   25  4.6 
219.4   1.0X
+1 cols x 10 rows (exec in-mem)  22 /   28  4.5 
223.8   1.0X
+1 cols x 10 rows (read parquet) 45 /   49  2.2 
449.6   0.5X
+1 cols x 10 rows (write parquet)   204 /  223  0.5 
   2044.4   0.1X
--- End diff --

This might be a little regression on Parquet writer from Spark 2.1.0 
(SPARK-17335).

cc @cloud-fan and @gatorsmile , @rdblue 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r224985471
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

My point was that there's no point of checking `null` below from my cursory 
look. If there's no chance that it becomes `null`, we can leave it `Some` and 
remove `null` check below.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r223220176
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

Why do you replace `Some` to `Option`? Are you worrying `new 
FileOutputStream(file)` becomes `null`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-07 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r223202914
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

I am worried that I will forget it after a long time, so I am changing this 
time. I should revert it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r223196145
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

IIUC, @HyukjinKwon meant `when you need to touch this file`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-06 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r223195740
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

Change here because: 
https://github.com/apache/spark/pull/22443#discussion_r221181428


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r223195081
  
--- Diff: 
core/src/test/scala/org/apache/spark/benchmark/BenchmarkBase.scala ---
@@ -48,15 +48,11 @@ abstract class BenchmarkBase {
   if (!file.exists()) {
 file.createNewFile()
   }
-  output = Some(new FileOutputStream(file))
+  output = Option(new FileOutputStream(file))
--- End diff --

This looks like irrelevant pig-back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-09-23 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r219725654
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala
 ---
@@ -17,22 +17,19 @@
 
 package org.apache.spark.sql
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- *  build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to 
"sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class  
+ * 2. build/sbt "sql/test:runMain "
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *Results will be written to 
"benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --

Thanks @dongjoon-hyun. Actually I'm waiting for 
https://github.com/apache/spark/pull/22484. I want to move  `withTempDir()` to  
`RunBenchmarkWithCodegen.scala`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r219724989
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala
 ---
@@ -17,22 +17,19 @@
 
 package org.apache.spark.sql
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- *  build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to 
"sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class  
+ * 2. build/sbt "sql/test:runMain "
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *Results will be written to 
"benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --

Could you fix doc generation failure?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-09-20 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/22501

[SPARK-25492][TEST] Refactor WideSchemaBenchmark to use main method

## What changes were proposed in this pull request?

Refactor `WideSchemaBenchmark` to use main method.
Generate benchmark result:
```sh
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.WideSchemaBenchmark"
```

## How was this patch tested?

manual tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-25492

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22501.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22501


commit f56b73223fbf765e408d9aef6565a2318f4836e3
Author: Yuming Wang 
Date:   2018-09-20T16:04:30Z

Refactor WideSchemaBenchmark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org