[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20428
  
**[Test build #86798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86798/testReport)**
 for PR 20428 at commit 
[`7a71c5a`](https://github.com/apache/spark/commit/7a71c5a294da230faf19965dc1d068adc3678411).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20432
  
**[Test build #86797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86797/testReport)**
 for PR 20432 at commit 
[`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20431
  
**[Test build #86796 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86796/testReport)**
 for PR 20431 at commit 
[`9a4a484`](https://github.com/apache/spark/commit/9a4a4842b3f8281e73e564f4dfdad92017630760).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #86799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86799/testReport)**
 for PR 20433 at commit 
[`830cf8d`](https://github.com/apache/spark/commit/830cf8d014ae17ade5fd771ca98c8c846c93).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86801 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86801/testReport)**
 for PR 20386 at commit 
[`42dc690`](https://github.com/apache/spark/commit/42dc69004ad37a5c4a5d8c96478a875ff4baed4e).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20427
  
**[Test build #86793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86793/testReport)**
 for PR 20427 at commit 
[`b4fdbbe`](https://github.com/apache/spark/commit/b4fdbbe265943012093fbc0f54e8b22184fa2987).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86800 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86800/testReport)**
 for PR 20343 at commit 
[`d04b087`](https://github.com/apache/spark/commit/d04b0872bcc02b5eadd309c560cda77ff1b8da0a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86801/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20428
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86798/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86797/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86800/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86799/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20431
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86796/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20428
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20431
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20427
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86793/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20427
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20427
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20432
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20428
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/367/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20432
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20428
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/368/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20428
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/369/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20427
  
**[Test build #86805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86805/testReport)**
 for PR 20427 at commit 
[`b4fdbbe`](https://github.com/apache/spark/commit/b4fdbbe265943012093fbc0f54e8b22184fa2987).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20428
  
**[Test build #86804 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86804/testReport)**
 for PR 20428 at commit 
[`7a71c5a`](https://github.com/apache/spark/commit/7a71c5a294da230faf19965dc1d068adc3678411).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20427
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/370/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20432: [SPARK-23174][BUILD][PYTHON][FOLLOWUP] Add pycodestyle*....

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20432
  
**[Test build #86803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86803/testReport)**
 for PR 20432 at commit 
[`3fb3d78`](https://github.com/apache/spark/commit/3fb3d785a9b2497b6ec3b9ac9329db776568197c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20422: [SPARK-23253][Core][Shuffle]Only write shuffle temporary...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20422
  
**[Test build #86806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86806/testReport)**
 for PR 20422 at commit 
[`6196770`](https://github.com/apache/spark/commit/61967706c6f3804a84819f8484abeff5d1d77eea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/20421
  
@felixcheung just added a few more behavior changes I found. Should be 
final now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/371/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20421
  
**[Test build #86807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86807/testReport)**
 for PR 20421 at commit 
[`4a957f6`](https://github.com/apache/spark/commit/4a957f677eadfa5345a62f78b254c999869a1940).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/372/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20428: [SPARK-23261] [PySpark] Rename Pandas UDFs

2018-01-30 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20428
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20421
  
**[Test build #86808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86808/testReport)**
 for PR 20421 at commit 
[`469d87d`](https://github.com/apache/spark/commit/469d87db6278da7f157d8e6c81e7a26c1b969e7c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/20386
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86807/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20421
  
**[Test build #86807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86807/testReport)**
 for PR 20421 at commit 
[`4a957f6`](https://github.com/apache/spark/commit/4a957f677eadfa5345a62f78b254c999869a1940).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/373/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20386
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20386
  
**[Test build #86809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86809/testReport)**
 for PR 20386 at commit 
[`42dc690`](https://github.com/apache/spark/commit/42dc69004ad37a5c4a5d8c96478a875ff4baed4e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20343
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20433
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20431
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20421
  
**[Test build #86808 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86808/testReport)**
 for PR 20421 at commit 
[`469d87d`](https://github.com/apache/spark/commit/469d87db6278da7f157d8e6c81e7a26c1b969e7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20421: [SPARK-23112][DOC] Update ML migration guide with breaki...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20421
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86808/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20431
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/375/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20431
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/374/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not f...

2018-01-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20431#discussion_r164670510
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala ---
@@ -154,7 +154,7 @@ class DataFrameRangeSuite extends QueryTest with 
SharedSQLContext with Eventuall
   test("Cancelling stage in a query with Range.") {
 val listener = new SparkListener {
   override def onJobStart(jobStart: SparkListenerJobStart): Unit = {
-eventually(timeout(10.seconds)) {
+eventually(timeout(10.seconds), interval(1.millis)) {
--- End diff --

The default interval is 15millis, IIUC. It is more possibly that the range 
stage finishes in the interval. So reduce it to 1millis.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/376/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20431: [SPARK-23222][SQL] Make DataFrameRangeSuite not flaky

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20431
  
**[Test build #86811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86811/testReport)**
 for PR 20431 at commit 
[`9a4a484`](https://github.com/apache/spark/commit/9a4a4842b3f8281e73e564f4dfdad92017630760).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86812/testReport)**
 for PR 20343 at commit 
[`d04b087`](https://github.com/apache/spark/commit/d04b0872bcc02b5eadd309c560cda77ff1b8da0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Support interval values without INTER...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #86810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86810/testReport)**
 for PR 20433 at commit 
[`830cf8d`](https://github.com/apache/spark/commit/830cf8d014ae17ade5fd771ca98c8c846c93).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/20434

[SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMethodLimit to 65535

## What changes were proposed in this pull request?
Still saw the performance regression introduced by 
`spark.sql.codegen.hugeMethodLimit` in our internal workloads. There are two 
major issues in the current solution.
- The size of the complied byte code is not identical to the bytecode size 
of the method. The detection is still not accurate. 
- The bytecode size of a single operator (e.g., `SerializeFromObject`) 
could still exceed 8K limit. We saw the performance regression in such 
scenario. 

Since it is close to the release of 2.3, we decide to increase it to 64K 
for avoiding the perf regression.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark revertConf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20434


commit 4b358dc8cbaba5603ce861623403db8b146f9337
Author: gatorsmile 
Date:   2018-01-30T08:30:50Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20434
  
**[Test build #86813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86813/testReport)**
 for PR 20434 at commit 
[`4b358dc`](https://github.com/apache/spark/commit/4b358dc8cbaba5603ce861623403db8b146f9337).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20434
  
cc @sameeragarwal @zsxwing @rxin @cloud-fan @rednaxelafx @yhuai 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20434
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/377/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20378: [SPARK-11222][Build][Python] Python document styl...

2018-01-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20378#discussion_r164672962
  
--- Diff: dev/run-tests.py ---
@@ -576,7 +576,10 @@ def main():
 for f in changed_files):
 # run_java_style_checks()
 pass
-if not changed_files or any(f.endswith(".py") for f in changed_files):
+if not changed_files or any(f.endswith("lint-python")
+or f.endswith("tox.ini")
+or f.endswith(".py")
+for f in changed_files):
--- End diff --

Can you resolve the conflict? Looks like this change is already merged from 
#20338.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkSession ...

2018-01-30 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20404
  
@felixcheung I see, in that case, we should revert the last commit 
(cc4b8510c1445fb742c0d750958d352adfa84902) to check the default session is 
updated or not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/20434#discussion_r164676771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -660,12 +660,10 @@ object SQLConf {
   val WHOLESTAGE_HUGE_METHOD_LIMIT = 
buildConf("spark.sql.codegen.hugeMethodLimit")
 .internal()
 .doc("The maximum bytecode size of a single compiled Java function 
generated by whole-stage " +
-  "codegen. When the compiled function exceeds this threshold, " +
-  "the whole-stage codegen is deactivated for this subtree of the 
current query plan. " +
-  s"The default value is 
${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-  "this is a limit in the OpenJDK JVM implementation.")
--- End diff --

nit: might want to still keep the last line around to indicate where the 
64k limit is coming from


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20427: [SPARK-23260][SPARK-23262][SQL] several data sour...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20427#discussion_r164676886
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -23,7 +23,7 @@ import org.apache.spark.sql.sources.v2.reader._
 
 case class DataSourceV2Relation(
--- End diff --

This is internal, we can clean it up at any time. I wanna focus on public 
APIs in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hugeMetho...

2018-01-30 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20434
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20378: [SPARK-11222][Build][Python] Python document style check...

2018-01-30 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20378
  
One question I have is, do the current violations cause significant 
document error?

Overall this is a good idea. However, is it worth enforcedly applying this 
if we consider the effort of fixing the violations, backporting difficulty in 
the future?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20427: [SPARK-23260][SPARK-23262][SQL] several data source v2 n...

2018-01-30 Thread sameeragarwal
Github user sameeragarwal commented on the issue:

https://github.com/apache/spark/pull/20427
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20231: [SPARK-23000][TEST-HADOOP2.6] Fix Flaky test suit...

2018-01-30 Thread sameeragarwal
Github user sameeragarwal closed the pull request at:

https://github.com/apache/spark/pull/20231


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18339: [SPARK-21094][PYTHON] Add popen_kwargs to launch_gateway

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18339
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164680081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala
 ---
@@ -54,10 +54,6 @@ case class WriteToDataSourceV2Exec(writer: 
DataSourceV2Writer, query: SparkPlan)
 }
 
 val rdd = query.execute()
-val messages = new Array[WriterCommitMessage](rdd.partitions.length)
-
-logInfo(s"Start processing data source writer: $writer. " +
-  s"The input RDD has ${messages.length} partitions.")
--- End diff --

might be good to keep this log.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164680632
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala
 ---
@@ -118,14 +118,21 @@ class MemoryWriter(sink: MemorySinkV2, batchId: Long, 
outputMode: OutputMode)
 
   override def createWriterFactory: MemoryWriterFactory = 
MemoryWriterFactory(outputMode)
 
-  def commit(messages: Array[WriterCommitMessage]): Unit = {
+  private val messages = new ArrayBuffer[WriterCommitMessage]()
+
+  override def add(message: WriterCommitMessage): Unit = synchronized {
+messages += message
+  }
+
+  def commit(): Unit = synchronized {
 val newRows = messages.flatMap {
   case message: MemoryWriterCommitMessage => message.data
-}
+}.toArray
 sink.write(batchId, outputMode, newRows)
+messages.clear()
   }
 
-  override def abort(messages: Array[WriterCommitMessage]): Unit = {
+  override def abort(): Unit = {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164680538
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriter.scala
 ---
@@ -39,13 +41,20 @@ class ConsoleWriter(schema: StructType, options: 
DataSourceV2Options)
 
   def createWriterFactory(): DataWriterFactory[Row] = 
PackedRowWriterFactory
 
-  override def commit(epochId: Long, messages: 
Array[WriterCommitMessage]): Unit = {
+  private val messages = new ArrayBuffer[WriterCommitMessage]()
+
+  override def add(message: WriterCommitMessage): Unit = synchronized {
+messages += message
+  }
+
+  override def commit(epochId: Long): Unit = synchronized {
 // We have to print a "Batch" label for the epoch for compatibility 
with the pre-data source V2
 // behavior.
-printRows(messages, schema, s"Batch: $epochId")
+printRows(messages.toArray, schema, s"Batch: $epochId")
+messages.clear()
   }
 
-  def abort(epochId: Long, messages: Array[WriterCommitMessage]): Unit = {}
+  def abort(epochId: Long): Unit = {}
--- End diff --

we should clear the message array in abort too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164680686
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/memoryV2.scala
 ---
@@ -135,14 +142,21 @@ class MemoryStreamWriter(val sink: MemorySinkV2, 
outputMode: OutputMode)
 
   override def createWriterFactory: MemoryWriterFactory = 
MemoryWriterFactory(outputMode)
 
-  override def commit(epochId: Long, messages: 
Array[WriterCommitMessage]): Unit = {
+  private val messages = new ArrayBuffer[WriterCommitMessage]()
+
+  override def add(message: WriterCommitMessage): Unit = synchronized {
+messages += message
+  }
+
+  override def commit(epochId: Long): Unit = synchronized {
 val newRows = messages.flatMap {
   case message: MemoryWriterCommitMessage => message.data
-}
+}.toArray
 sink.write(epochId, outputMode, newRows)
+messages.clear()
   }
 
-  override def abort(epochId: Long, messages: Array[WriterCommitMessage]): 
Unit = {
+  override def abort(epochId: Long): Unit = {
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164680877
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkV2Suite.scala
 ---
@@ -41,19 +41,22 @@ class MemorySinkV2Suite extends StreamTest with 
BeforeAndAfter {
   test("continuous writer") {
 val sink = new MemorySinkV2
 val writer = new MemoryStreamWriter(sink, OutputMode.Append())
-writer.commit(0,
-  Array(
-MemoryWriterCommitMessage(0, Seq(Row(1), Row(2))),
-MemoryWriterCommitMessage(1, Seq(Row(3), Row(4))),
-MemoryWriterCommitMessage(2, Seq(Row(6), Row(7)))
-  ))
+val messages = Seq(
+  MemoryWriterCommitMessage(0, Seq(Row(1), Row(2))),
+  MemoryWriterCommitMessage(1, Seq(Row(3), Row(4))),
+  MemoryWriterCommitMessage(2, Seq(Row(6), Row(7)))
+)
+messages.foreach(writer.add(_))
--- End diff --

nit:
```
writer.add(MemoryWriterCommitMessage(0, Seq(Row(1), Row(2
writer.add(MemoryWriterCommitMessage(1, Seq(Row(3), Row(4
..
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer....

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20386#discussion_r164681157
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/ConsoleWriterSuite.scala
 ---
@@ -34,9 +33,9 @@ class ConsoleWriterSuite extends StreamTest {
 Console.withOut(captured) {
   val query = input.toDF().writeStream.format("console").start()
   try {
-input.addData(1, 2, 3)
+input.addData(1, 1, 1)
--- End diff --

why this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20386: [SPARK-23202][SQL] Break down DataSourceV2Writer.commit ...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20386
  
I like this change! It adds a missing feature which is required for 
migrating the file-based data source(which use `FileCommitProtocol` and has a 
callback for task commit), and also make it possible to handle large jobs, 
which have a lot of tasks. Implementations can externalize the commit messages 
to avoid keeping too many messages in memory.

LGTM, waiting feedback from others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20430: [SPARK-23263][SQL] Create table stored as parquet should...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20430
  
CC @wzhfy 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20361: [SPARK-23188][SQL] Make vectorized columar reader...

2018-01-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20361#discussion_r164685543
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -377,6 +377,12 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val PARQUET_VECTORIZED_READER_BATCH_SIZE = 
buildConf("spark.sql.parquet.batchSize")
--- End diff --

I'd say it's very hard. If we need to satisfy a sizeInBytes limitation, we 
would need to load data record by record, and stop loading if we hit the 
limitation. But for performance reasons, we wanna load the data with batch, 
which needs to know the batch size ahead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-30 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/19340
  
@mgaido91 @srowen   I have the same concern as @Kevin-Ferret and @viirya 
I don't find the normailization of vectors before training, and the update 
of center seems incorrect.
The arithmetic mean of all points in the cluster is not naturally the new 
cluster center:
For EUCLIDEAN distance, we need to update the center to minimize the square 
lose, then we get the arithmetic mean as the closed-form solution;
For COSINE similarity, we need to update the center to *maximize the cosine 
similarity*, the solution is also the arithmetic mean only if all vectors are 
of unit length.

In matlab's doc for KMeans, it says "One minus the cosine of the included 
angle between points (treated as vectors). Each centroid is the mean of the 
points in that cluster, after *normalizing those points to unit Euclidean 
length*."

I think RapidMiners's implementation of KMeans with cosine similarity is 
wrong, if it just assign new center with the arithmetic mean.

Some reference:
[Spherical k-Means 
Clustering](https://www.jstatsoft.org/article/view/v050i10/v50i10.pdf)

[Scikit-Learn's example: Clustering text documents using 
k-means](http://scikit-learn.org/dev/auto_examples/text/plot_document_clustering.html)


https://stats.stackexchange.com/questions/299013/cosine-distance-as-similarity-measure-in-kmeans


https://www.quora.com/How-can-I-use-cosine-similarity-in-clustering-For-example-K-means-clustering






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20435: [SPARK-23268][SQL]Reorganize packages in data sou...

2018-01-30 Thread gengliangwang
GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/20435

[SPARK-23268][SQL]Reorganize packages in data source V2

## What changes were proposed in this pull request?
1. create a new package for partitioning/distribution related classes.
As Spark will add new concrete implementations of `Distribution` in new 
releases, it is good to 
have a new package for partitioning/distribution related classes.

2. move streaming related class to package 
`org.apache.spark.sql.sources.v2.reader/writer.streaming`, instead of 
`org.apache.spark.sql.sources.v2.streaming.reader/writer`.
So that the there won't be package reader/writer inside package streaming, 
which is quite confusing.
## How was this patch tested?
Unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark new_pkg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20435.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20435


commit da29b863970b8ca5039d547ae5240e5c34f9f11a
Author: Wang Gengliang 
Date:   2018-01-30T08:31:10Z

create a new package for partitioning/distribution related classes

commit 3dc56226ed17637c808df9d8d03f60fcbbae9e55
Author: Wang Gengliang 
Date:   2018-01-30T09:16:01Z

re-org streaming




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86814/testReport)**
 for PR 20435 at commit 
[`3dc5622`](https://github.com/apache/spark/commit/3dc56226ed17637c808df9d8d03f60fcbbae9e55).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20435
  
**[Test build #86814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86814/testReport)**
 for PR 20435 at commit 
[`3dc5622`](https://github.com/apache/spark/commit/3dc56226ed17637c808df9d8d03f60fcbbae9e55).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86814/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20434: [SPARK-23267] [SQL] Increase spark.sql.codegen.hu...

2018-01-30 Thread rednaxelafx
Github user rednaxelafx commented on a diff in the pull request:

https://github.com/apache/spark/pull/20434#discussion_r164687283
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -660,12 +660,10 @@ object SQLConf {
   val WHOLESTAGE_HUGE_METHOD_LIMIT = 
buildConf("spark.sql.codegen.hugeMethodLimit")
 .internal()
 .doc("The maximum bytecode size of a single compiled Java function 
generated by whole-stage " +
-  "codegen. When the compiled function exceeds this threshold, " +
-  "the whole-stage codegen is deactivated for this subtree of the 
current query plan. " +
-  s"The default value is 
${CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT} and " +
-  "this is a limit in the OpenJDK JVM implementation.")
--- End diff --

The 8000 byte limit is a HotSpot-specific thing, but the 64KB limit is 
imposed by the Java Class File format, as a part of the JVM spec.

We may want to wordsmith a bit here to explain that:
1. 65535 is a largest bytecode size possible for a valid Java method; 
setting the default value to 65535 is effectively turning the limit off for 
whole-stage codegen;
2. For those that wish to turn this limit on when running on HotSpot, it 
may be preferable to set the value to 
`CodeGenerator.DEFAULT_JVM_HUGE_METHOD_LIMIT` to match HotSpot's implementation.

I don't have a good concrete suggestion as to how to concisely expression 
these two points in the doc string, though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkSession ...

2018-01-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20404
  
For perfectness, I think we should hold a lock with JVM instance but I 
wonder if it's easily possible.  I roughly knew this but I think underestimated 
this because I believe that will quite unlikely happens. I think reverting 
https://github.com/apache/spark/commit/cc4b8510c1445fb742c0d750958d352adfa84902 
doesn't fully resolve the issue because I think the same thing can also happen 
between `if` and the next line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20435: [SPARK-23268][SQL]Reorganize packages in data source V2

2018-01-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20435
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/378/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >