date:20180121

[GitHub] spark issue #20344: [MINOR] Typo fixes

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20344
  
**[Test build #86447 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86447/testReport)**
 for PR 20344 at commit 
[`9fff0ed`](https://github.com/apache/spark/commit/9fff0ed104650f4e92ae87deb91381cd79ac5bfa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2018-01-21 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19340
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Hi, @gatorsmile , @cloud-fan , @sameeragarwal , @HyukjinKwon .
The PR is ready for review again. Spark commit log seems to be a little 
quiet since yesterday.
Could you squeeze some time to give for this Schema Evolution suite? Thank 
you in advance for any advice!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20339: [SPARK-23169][INFRA][R] Run lintr on the changes of lint...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20339
  
**[Test build #86430 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86430/testReport)**
 for PR 20339 at commit 
[`fe59c8a`](https://github.com/apache/spark/commit/fe59c8aa2b6e66930ef1fade198d35eeb27210ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20339: [SPARK-23169][INFRA][R] Run lintr on the changes of lint...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20339
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86444/testReport)**
 for PR 20343 at commit 
[`71e0e1a`](https://github.com/apache/spark/commit/71e0e1ad2f6ee772103ef43d2270ed347bc497d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20344: [MINOR] Typo fixes

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20344
  
**[Test build #86447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86447/testReport)**
 for PR 20344 at commit 
[`9fff0ed`](https://github.com/apache/spark/commit/9fff0ed104650f4e92ae87deb91381cd79ac5bfa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20342
  
**[Test build #86446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86446/testReport)**
 for PR 20342 at commit 
[`8716c6e`](https://github.com/apache/spark/commit/8716c6eccf3d606a2c5f6275b8b3a9b343b01393).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20340: [SPARK-21293][SS][SPARKR] Add doc example for streaming ...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20340
  
**[Test build #86434 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86434/testReport)**
 for PR 20340 at commit 
[`e4e6a96`](https://github.com/apache/spark/commit/e4e6a96a6bad5c1a35e546ee536e421f163b858f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20341: [MINOR] [SQL] Test case cleanups for recent PRs

2018-01-21 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20341
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20342
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/76/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20333
  
Will address my comment in my PR. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20340: [SPARK-21293][SS][SPARKR] Add doc example for streaming ...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20340
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20338: [SPARK-11222][BUILD][PYTHON] python code style ch...

2018-01-21 Thread rekhajoshm

GitHub user rekhajoshm opened a pull request:

https://github.com/apache/spark/pull/20338

[SPARK-11222][BUILD][PYTHON] python code style checker update

## What changes were proposed in this pull request?
Referencing latest python code style checking from PyPi/pycodestyle
Removed pending TODO
For now, excluded the additional style error in tox.ini discovered on 
existing python due to latest style checker (will fallback on review comment to 
finalize check exclusion or not)
Any further code styling needs to be part of pycodestyle, not here.

## How was this patch tested?
./dev/run-tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/spark SPARK-11222

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20338.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20338


commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi 
Date:   2015-05-05T23:10:08Z

Merge pull request #1 from apache/master

Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi 
Date:   2015-05-08T21:49:09Z

Merge pull request #2 from apache/master

pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi 
Date:   2015-06-22T00:08:08Z

Merge pull request #3 from apache/master

Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi 
Date:   2015-09-17T01:03:09Z

Merge pull request #4 from apache/master

Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi 
Date:   2015-11-25T18:50:32Z

Merge pull request #5 from apache/master

pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi 
Date:   2016-03-18T19:13:51Z

Merge pull request #6 from apache/master

pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi 
Date:   2016-04-05T20:26:40Z

Merge pull request #8 from apache/master

pull latest from apache spark

commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi 
Date:   2017-05-01T23:00:30Z

Merge pull request #9 from apache/master

Pull apache spark

commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi 
Date:   2017-09-30T22:26:44Z

Merge pull request #10 from apache/master

pull latest apache spark

commit a7fc787466b71784ff86f9694f617db0f1042da8
Author: Rekha Joshi 
Date:   2018-01-21T00:17:58Z

Merge pull request #11 from apache/master

Apache spark pull latest

commit 88734413c5e33803b7d5f8c0f81899ed1d3577ff
Author: rjoshi2 
Date:   2018-01-21T03:40:59Z

[SPARK-11222][BUILD][PYTHON] python code style checker update




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20340: [SPARK-21293][SS][SPARKR] Add doc example for streaming ...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20340
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86436/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20340: [SPARK-21293][SS][SPARKR] Add doc example for streaming ...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20340
  
**[Test build #86436 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86436/testReport)**
 for PR 20340 at commit 
[`f22efd4`](https://github.com/apache/spark/commit/f22efd4bb5396c35023d15b16363dca55be4bd24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20333: [SPARK-23087][SQL] CheckCartesianProduct too restrictive...

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20333
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18277
  
Let me merge this one only into master considering the concerns - 
https://github.com/apache/spark/pull/18277#pullrequestreview-90007120 and 
https://github.com/apache/spark/pull/18277#issuecomment-358876719. Adding a 
note could be fine. I don't feel strongly about it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error in pip...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18277
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20208: [SPARK-23007][SQL][TEST] Add schema evolution tes...

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20208#discussion_r162835707
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaEvolutionTest.scala
 ---
@@ -0,0 +1,406 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.File
+
+import org.apache.spark.sql.{QueryTest, Row}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.test.{SharedSQLContext, SQLTestUtils}
+
+/**
+ * Schema can evolve in several ways and the followings are supported in 
file-based data sources.
+ *
+ *   1. Add a column
+ *   2. Remove a column
+ *   3. Change a column position
+ *   4. Change a column type
+ *
+ * Here, we consider safe evolution without data loss. For example, data 
type evolution should be
+ * from small types to larger types like `int`-to-`long`, not vice versa.
+ *
+ * So far, file-based data sources have schema evolution coverages like 
the followings.
+ *
+ *   | File Format  | Coverage | Note  
 |
+ *   |  |  | 
-- |
+ *   | TEXT | N/A  | Schema consists of a single string 
column. |
+ *   | CSV  | 1, 2, 4  |   
 |
+ *   | JSON | 1, 2, 3, 4   |   
 |
+ *   | ORC  | 1, 2, 3, 4   | Native vectorized ORC reader has the 
widest coverage.  |
+ *   | PARQUET  | 1, 2, 3  |   
 |
--- End diff --

Correct, and this is not about schema merging.
The final correct schema is given by users (or Hive).
In this PR, all schema is given by users, but for Hive table, we uses the 
Hive Metastore Schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20338: [SPARK-11222][BUILD][PYTHON] python code style checker u...

2018-01-21 Thread rekhajoshm

Github user rekhajoshm commented on the issue:

https://github.com/apache/spark/pull/20338
  
Thanks @HyukjinKwon for your review.updated, please verify.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20338: [SPARK-11222][BUILD][PYTHON] python code style checker u...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20338
  
**[Test build #86451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86451/testReport)**
 for PR 20338 at commit 
[`8718891`](https://github.com/apache/spark/commit/871889113057f819337bd24379cf1f07516c3298).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20338: [SPARK-11222][BUILD][PYTHON] python code style checker u...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20338: [SPARK-23174][BUILD][PYTHON] python code style ch...

2018-01-21 Thread rekhajoshm

Github user rekhajoshm commented on a diff in the pull request:

https://github.com/apache/spark/pull/20338#discussion_r162838933
  
--- Diff: dev/tox.ini ---
@@ -13,7 +13,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-[pep8]
-ignore=E402,E731,E241,W503,E226
+[pycodestyle]
+ignore=E402,E731,E241,W503,E226,E722,E741,E305
--- End diff --

Updated SPARK-11222. Added 
https://issues.apache.org/jira/browse/SPARK-23174.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20346: [MINOR][SQL] Fix wrong comments on org.apache.spark.sql....

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20346
  
Thank you for review and approval, @HyukjinKwon !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20337: [SPARK-11630] [core] ClosureCleaner moved from warning t...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/61/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Respect Project nodes in ReorderJoin

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20345
  
**[Test build #86449 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86449/testReport)**
 for PR 20345 at commit 
[`8ad6a81`](https://github.com/apache/spark/commit/8ad6a813818b34b3bdfd94f93c0a1f664945da34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Respect Project nodes in ReorderJoin

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/79/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20337: [SPARK-11630] [core] ClosureCleaner moved from warning t...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20337
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86445/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20345: [SPARK-23172][SQL] Respect Project nodes in ReorderJoin

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20345
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20341: [MINOR] [SQL] Test case cleanups for recent PRs

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20341
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/69/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20342
  
**[Test build #86441 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86441/testReport)**
 for PR 20342 at commit 
[`e790ab9`](https://github.com/apache/spark/commit/e790ab9950aa3ed9a0662e4d10f9d8611ff8f1ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class QueryExecutionMetering() `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19993: [SPARK-22799][ML] Bucketizer should throw exception if s...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19993
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170][SQL] Dump the statistics of effective runs...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20342
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86441/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20344: [MINOR] Typo fixes

2018-01-21 Thread jaceklaskowski

GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/20344

[MINOR] Typo fixes

## What changes were proposed in this pull request?

Typo fixes

## How was this patch tested?

Local build / Doc-only changes

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark typo-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20344.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20344


commit 9fff0ed104650f4e92ae87deb91381cd79ac5bfa
Author: Jacek Laskowski 
Date:   2018-01-21T17:59:26Z

[MINOR] Typo fixes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20341: [MINOR] [SQL] Test case cleanups by recent PRs

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20341#discussion_r162812208
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -276,16 +277,14 @@ class DataFrameJoinSuite extends QueryTest with 
SharedSQLContext {
 
   test("SPARK-23087: don't throw Analysis Exception in 
CheckCartesianProduct when join condition " +
 "is false or null") {
-val df = spark.range(10)
-val dfNull = spark.range(10).select(lit(null).as("b"))
-val planNull = df.join(dfNull, $"id" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planNull).optimizedPlan
-
-val dfOne = df.select(lit(1).as("a"))
-val dfTwo = spark.range(10).select(lit(2).as("b"))
-val planFalse = dfOne.join(dfTwo, $"a" === $"b", 
"left").queryExecution.analyzed
-
-spark.sessionState.executePlan(planFalse).optimizedPlan
+withSQLConf(SQLConf.CROSS_JOINS_ENABLED.key -> "false") {
+  val df = spark.range(10)
+  val dfNull = spark.range(10).select(lit(null).as("b"))
+  df.join(dfNull, $"id" === $"b", "left").queryExecution.optimizedPlan
+
+  val dfOne = df.select(lit(1).as("a"))
+  val dfTwo = spark.range(10).select(lit(2).as("b"))
+  dfOne.join(dfTwo, $"a" === $"b", "left").queryExecution.optimizedPlan
+}
--- End diff --

cc @mariobriggs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20341: [MINOR] [SQL] Test case cleanups by recent PRs

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20341#discussion_r162812178
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala 
---
@@ -49,8 +49,12 @@ class HiveUDAFSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils {
   }
 
   protected override def afterAll(): Unit = {
-sql(s"DROP TEMPORARY FUNCTION IF EXISTS mock")
-sql(s"DROP TEMPORARY FUNCTION IF EXISTS hive_max")
+try {
+  sql(s"DROP TEMPORARY FUNCTION IF EXISTS mock")
+  sql(s"DROP TEMPORARY FUNCTION IF EXISTS hive_max")
--- End diff --

Actually, these drop functions are unnecessary. However, it sounds also 
fine to keep them because we should clean up the local objects after usage.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20338: [SPARK-11222][BUILD][PYTHON] python code style ch...

2018-01-21 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20338#discussion_r162804815
  
--- Diff: dev/lint-python ---
@@ -35,11 +35,9 @@ compile_status="${PIPESTATUS[0]}"
 
 # Get pep8 at runtime so that we don't rely on it being installed on the 
build server.
 #+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
-#+ TODOs:
-#+  - Download pep8 from PyPI. It's more "official".
-PEP8_VERSION="1.7.0"
+PEP8_VERSION="2.3.1"
 PEP8_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pep8-$PEP8_VERSION.py"

-PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/jcrocholl/pep8/$PEP8_VERSION/pep8.py;

+PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/PyCQA/pycodestyle/$PEP8_VERSION/pycodestyle.py;
--- End diff --

Shall we leave a note that `pep8` is formally renamed to `pycodestyle` to 
reduce confusion, and in the PR description too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20341: [MINOR] [SQL] Test case cleanups by recent PRs

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20341#discussion_r162812196
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala ---
@@ -492,8 +492,7 @@ private[hive] class TestHiveSparkSession(
   protected val originalUDFs: JavaSet[String] = 
FunctionRegistry.getFunctionNames
 
   /**
-   * Resets the test instance by deleting any tables that have been 
created.
-   * TODO: also clear out UDFs, views, etc.
--- End diff --

These TODO has been addressed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20342: [SPARK-23170][SQL] Dump the statistics of effecti...

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20342#discussion_r162817354
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/QueryExecutionMetering.scala
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.rules
+
+import com.google.common.util.concurrent.AtomicLongMap
+import scala.collection.JavaConverters._
--- End diff --

Yeah. https://github.com/databricks/scala-style-guide#imports


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170] Dump the statistics of effective runs of a...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20342
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20177
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20177: [SPARK-22954][SQL] Fix the exception thrown by Analyze c...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20177
  
**[Test build #86425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86425/testReport)**
 for PR 20177 at commit 
[`77e4d6d`](https://github.com/apache/spark/commit/77e4d6db1d647db7a7b2c13c922bab0bdd3e53fc).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20342: [SPARK-23170] Dump the statistics of effective runs of a...

2018-01-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20342
  
You forgot to add `[SQL]` in the title?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20340: [SPARK-21293][SS][SPARKR] Add doc example for streaming ...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20340
  
**[Test build #86434 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86434/testReport)**
 for PR 20340 at commit 
[`e4e6a96`](https://github.com/apache/spark/commit/e4e6a96a6bad5c1a35e546ee536e421f163b858f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20339: [SPARK-23169][INFRA][R] Run lintr on the changes ...

2018-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20339


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20343
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCD...

2018-01-21 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20343#discussion_r162835887
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala ---
@@ -339,6 +340,30 @@ class TPCDSQuerySuite extends BenchmarkQueryTest {
 }
   }
 
+  val tpcdsQueriesV2_7_0 = Seq(
+"q1", "q2", "q3", "q4", "q5", "q5a", "q6", "q7", "q8", "q9", "q10", 
"q10a", "q11",
+"q12", "q13", "q14_1", "q14_2", "q14a_1", "q14a_2",  "q15", "q16", 
"q17", "q18", "q18a", "q19",
+"q20", "q21", "q22", "q22a", "q23_1", "q23_2", "q24_1", "q24_2", 
"q25", "q26", "q27", "q27a",
+"q28", "q29", "q30", "q31", "q32", "q33", "q34", "q35", "q35a", "q36", 
"q36a", "q37", "q38",
+"q39_1", "q39_2", "q40", "q41", "q42", "q43", "q44", "q45", "q46", 
"q47", "q48", "q49",
+"q50", "q51", "q51a", "q52", "q53", "q54", "q55", "q56", "q57", "q58", 
"q59",
+"q60", "q61", "q62", "q63", "q64", "q65", "q66", "q67", "q67a", "q68", 
"q69",
+"q70", "q70a", "q71", "q72", "q73", "q74", "q75", "q76", "q77", 
"q77a", "q78", "q79",
+"q80", "q80a", "q81", "q82", "q83", "q84", "q85", "q86", "q86a", 
"q87", "q88", "q89",
+"q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
+
+  tpcdsQueriesV2_7_0.foreach { name =>
+val queryString = resourceToString(s"tpcds-v2.7.0/$name.sql",
--- End diff --

ok, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86452 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86452/testReport)**
 for PR 20343 at commit 
[`9ac04ed`](https://github.com/apache/spark/commit/9ac04edc5aa770fb04b9ad4c12de75fa6d4ac2c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCD...

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20343#discussion_r162835844
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala ---
@@ -339,6 +340,30 @@ class TPCDSQuerySuite extends BenchmarkQueryTest {
 }
   }
 
+  val tpcdsQueriesV2_7_0 = Seq(
+"q1", "q2", "q3", "q4", "q5", "q5a", "q6", "q7", "q8", "q9", "q10", 
"q10a", "q11",
+"q12", "q13", "q14_1", "q14_2", "q14a_1", "q14a_2",  "q15", "q16", 
"q17", "q18", "q18a", "q19",
+"q20", "q21", "q22", "q22a", "q23_1", "q23_2", "q24_1", "q24_2", 
"q25", "q26", "q27", "q27a",
+"q28", "q29", "q30", "q31", "q32", "q33", "q34", "q35", "q35a", "q36", 
"q36a", "q37", "q38",
+"q39_1", "q39_2", "q40", "q41", "q42", "q43", "q44", "q45", "q46", 
"q47", "q48", "q49",
+"q50", "q51", "q51a", "q52", "q53", "q54", "q55", "q56", "q57", "q58", 
"q59",
+"q60", "q61", "q62", "q63", "q64", "q65", "q66", "q67", "q67a", "q68", 
"q69",
+"q70", "q70a", "q71", "q72", "q73", "q74", "q75", "q76", "q77", 
"q77a", "q78", "q79",
+"q80", "q80a", "q81", "q82", "q83", "q84", "q85", "q86", "q86a", 
"q87", "q88", "q89",
+"q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
+
+  tpcdsQueriesV2_7_0.foreach { name =>
+val queryString = resourceToString(s"tpcds-v2.7.0/$name.sql",
--- End diff --

Thanks. @maropu . 
While reviewing this, I found that we missed that bug at the original PR of 
@gatorsmile .
If the fix is able to be included in Apache Spark 2.3, I think the followup 
PR also sounds good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18277: [SPARK-20947][PYTHON] Fix encoding/decoding error...

2018-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18277


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/81/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20331: [SPARK-23158] [SQL] Move HadoopFsRelationTest tes...

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/20331#discussion_r162838622
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcHadoopFsRelationSuite.scala
 ---
@@ -82,44 +80,4 @@ class OrcHadoopFsRelationSuite extends 
HadoopFsRelationTest {
   }
 }
   }
-
-  test("SPARK-13543: Support for specifying compression codec for ORC via 
option()") {
-withTempPath { dir =>
-  val path = s"${dir.getCanonicalPath}/table1"
-  val df = (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", "b")
-  df.write
-.option("compression", "ZlIb")
-.orc(path)
-
-  // Check if this is compressed as ZLIB.
-  val maybeOrcFile = new File(path).listFiles().find { f =>
-!f.getName.startsWith("_") && f.getName.endsWith(".zlib.orc")
-  }
-  assert(maybeOrcFile.isDefined)
-  val orcFilePath = maybeOrcFile.get.toPath.toString
-  val expectedCompressionKind =
-OrcFileOperator.getFileReader(orcFilePath).get.getCompression
-  assert("ZLIB" === expectedCompressionKind.name())
-
-  val copyDf = spark
-.read
-.orc(path)
-  checkAnswer(df, copyDf)
-}
-  }
-
-  test("Default compression codec is snappy for ORC compression") {
-withTempPath { file =>
-  spark.range(0, 10).write
-.orc(file.getCanonicalPath)
-  val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
--- End diff --

@gatorsmile . This test case should be tested on `native` implementation, 
too.
`HiveOrcHadoopFsRelationSuite` test coverage is only `hive` implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20346: [MINOR][SQL] Fix wrong comments on org.apache.spark.sql....

2018-01-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20346
  
This is an old code comment we should remove it before. Thanks for fixing
it. LGTM



On Jan 22, 2018 11:20 AM, "Dongjoon Hyun"  wrote:

> cc @viirya 
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-21 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162840776
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -162,26 +162,29 @@ private[spark] class MemoryStore(
   }
 
   /**
-   * Attempt to put the given block in memory store as values.
+   * Attempt to put the given block in memory store as values or bytes.
*
* It's possible that the iterator is too large to materialize and store 
in memory. To avoid
* OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
* whether there is enough free memory. If the block is successfully 
materialized, then the
* temporary unroll memory used during the materialization is 
"transferred" to storage memory,
* so we won't acquire more memory than is actually needed to store the 
block.
*
-   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
-   * an iterator containing the values of the block. The returned 
iterator will be backed
-   * by the combination of the partially-unrolled block and the 
remaining elements of the
-   * original input iterator. The caller must either fully consume 
this iterator or call
-   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
-   * block.
+   * @param blockId The block id.
+   * @param values The values which need be stored.
+   * @param classTag the [[ClassTag]] for the block.
+   * @param memoryMode The values saved mode.
--- End diff --

`MemoryMode` only has ON_HEAP and OFF_HEAP two modes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2018-01-21 Thread ConeyLiu

Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19285#discussion_r162840896
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala ---
@@ -162,26 +162,29 @@ private[spark] class MemoryStore(
   }
 
   /**
-   * Attempt to put the given block in memory store as values.
+   * Attempt to put the given block in memory store as values or bytes.
*
* It's possible that the iterator is too large to materialize and store 
in memory. To avoid
* OOM exceptions, this method will gradually unroll the iterator while 
periodically checking
* whether there is enough free memory. If the block is successfully 
materialized, then the
* temporary unroll memory used during the materialization is 
"transferred" to storage memory,
* so we won't acquire more memory than is actually needed to store the 
block.
*
-   * @return in case of success, the estimated size of the stored data. In 
case of failure, return
-   * an iterator containing the values of the block. The returned 
iterator will be backed
-   * by the combination of the partially-unrolled block and the 
remaining elements of the
-   * original input iterator. The caller must either fully consume 
this iterator or call
-   * `close()` on it in order to free the storage memory consumed 
by the partially-unrolled
-   * block.
+   * @param blockId The block id.
+   * @param values The values which need be stored.
+   * @param classTag the [[ClassTag]] for the block.
+   * @param memoryMode The values saved mode.
+   * @param valuesHolder A holder that supports storing record of values 
into memory store as
+   *values of bytes.
+   * @return if the block is stored successfully, return the stored data 
size. Else return the
+   * memory has used for unroll the block.
--- End diff --

The block can be unrolled fully, but the used memory exceeded the request 
and can't request the extra memory.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19285
  
**[Test build #86455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86455/testReport)**
 for PR 19285 at commit 
[`c442494`](https://github.com/apache/spark/commit/c4424943f5b74f8d1c191228cd8055d5482e7658).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20347: [SPARK-20129][Core] JavaSparkContext should use S...

2018-01-21 Thread rekhajoshm

GitHub user rekhajoshm opened a pull request:

https://github.com/apache/spark/pull/20347

[SPARK-20129][Core] JavaSparkContext should use SparkContext.getOrCreate

## What changes were proposed in this pull request?
Using SparkContext getOrCreate() instead of recreating new sc in 
JavaSparkContext.

## How was this patch tested?
Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/spark SPARK-20129

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20347


commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi 
Date:   2015-05-05T23:10:08Z

Merge pull request #1 from apache/master

Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi 
Date:   2015-05-08T21:49:09Z

Merge pull request #2 from apache/master

pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi 
Date:   2015-06-22T00:08:08Z

Merge pull request #3 from apache/master

Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi 
Date:   2015-09-17T01:03:09Z

Merge pull request #4 from apache/master

Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi 
Date:   2015-11-25T18:50:32Z

Merge pull request #5 from apache/master

pull request from apache/master

commit c73c32aadd6066e631956923725a48d98a18777e
Author: Rekha Joshi 
Date:   2016-03-18T19:13:51Z

Merge pull request #6 from apache/master

pull latest from apache spark

commit 7dbf7320057978526635bed09dabc8cf8657a28a
Author: Rekha Joshi 
Date:   2016-04-05T20:26:40Z

Merge pull request #8 from apache/master

pull latest from apache spark

commit 5e9d71827f8e2e4d07027281b80e4e073e7fecd1
Author: Rekha Joshi 
Date:   2017-05-01T23:00:30Z

Merge pull request #9 from apache/master

Pull apache spark

commit 63d99b3ce5f222d7126133170a373591f0ac67dd
Author: Rekha Joshi 
Date:   2017-09-30T22:26:44Z

Merge pull request #10 from apache/master

pull latest apache spark

commit a7fc787466b71784ff86f9694f617db0f1042da8
Author: Rekha Joshi 
Date:   2018-01-21T00:17:58Z

Merge pull request #11 from apache/master

Apache spark pull latest

commit b1ae5125f65e0d8a59a4006a9777ed5185a758c9
Author: rjoshi2 
Date:   2018-01-22T02:53:06Z

[SPARK-20129][Core] JavaSparkContext should use SparkContext.getOrCreate




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20347: [SPARK-20129][Core] JavaSparkContext should use SparkCon...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20347
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/84/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20347: [SPARK-20129][Core] JavaSparkContext should use SparkCon...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20347
  
**[Test build #86456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86456/testReport)**
 for PR 20347 at commit 
[`b1ae512`](https://github.com/apache/spark/commit/b1ae5125f65e0d8a59a4006a9777ed5185a758c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20347: [SPARK-20129][Core] JavaSparkContext should use SparkCon...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20347
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20346: [MINOR][SQL] Fix wrong comments on org.apache.spark.sql....

2018-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20346
  
Thank you for review and confirming, @viirya !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20316: [SPARK-23149][SQL] polish ColumnarBatch

2018-01-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20316#discussion_r162842712
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java ---
@@ -96,16 +90,6 @@ public void setNumRows(int numRows) {
*/
   public int numRows() { return numRows; }
 
-  /**
-   * Returns the schema that makes up this batch.
-   */
-  public StructType schema() { return schema; }
-
-  /**
-   * Returns the max capacity (in number of rows) for this batch.
-   */
-  public int capacity() { return capacity; }
--- End diff --

For `ColumnarBatch` consumers, they don't care about `capacity`, but only 
`numRows`, 
`capacity` is only needed by column vector builders. Also they don't care 
about schema/field names, but only the data type of each column.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162844101
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -55,164 +43,82 @@ public void close() {
 if (childColumns != null) {
   for (int i = 0; i < childColumns.length; i++) {
 childColumns[i].close();
+childColumns[i] = null;
--- End diff --

what do you mean? `ColumnVector.close` is a required interface.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86457 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86457/testReport)**
 for PR 20343 at commit 
[`12f687c`](https://github.com/apache/spark/commit/12f687c3c4338478f7f0cc40474c90f55aab8ecf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/85/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20325: [SPARK-22808][DOCS] add insertInto when save hive...

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20325#discussion_r162845525
  
--- Diff: docs/sql-programming-guide.md ---
@@ -580,6 +580,9 @@ default local Hive metastore (using Derby) for you. 
Unlike the `createOrReplaceT
 Hive metastore. Persistent tables will still exist even after your Spark 
program has restarted, as
 long as you maintain your connection to the same metastore. A DataFrame 
for a persistent table can
 be created by calling the `table` method on a `SparkSession` with the name 
of the table.
+Notice that for `DataFrames` is built on Hive table, `insertInto` should 
be used instead of `saveAsTable`.
--- End diff --

Done. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20208
  
Will do it after 2.3 release


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20343
  
**[Test build #86452 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86452/testReport)**
 for PR 20343 at commit 
[`9ac04ed`](https://github.com/apache/spark/commit/9ac04edc5aa770fb04b9ad4c12de75fa6d4ac2c8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86452/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20338: [SPARK-23174][BUILD][PYTHON] python code style ch...

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20338#discussion_r162852255
  
--- Diff: dev/lint-python ---
@@ -35,11 +35,10 @@ compile_status="${PIPESTATUS[0]}"
 
 # Get pep8 at runtime so that we don't rely on it being installed on the 
build server.
 #+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162
-#+ TODOs:
-#+  - Download pep8 from PyPI. It's more "official".
-PEP8_VERSION="1.7.0"
+# Updated to latest official version for pep8.pep8 is formally renamed to 
pycodestyle.
+PEP8_VERSION="2.3.1"
--- End diff --

Should we also use `pycodestyle` instead of `pep8` for variable names or 
script names?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20259: [SPARK-23066][WEB-UI] Master Page increase master start-...

2018-01-21 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20259
  
I kind of agree with @CodingCat , I think we have plenty of third-party 
monitoring tools to monitor the availability of Master process, it is not so 
necessary to expose here in Master UI. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2018-01-21 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r162856254
  
--- Diff: python/pyspark/context.py ---
@@ -1023,6 +1032,35 @@ def getConf(self):
 conf.setAll(self._conf.getAll())
 return conf
 
+def install_packages(self, packages, install_driver=True):
+"""
+install python packages on all executors and driver through pip. 
pip will be installed
+by default no matter using native virtualenv or conda. So it is 
guaranteed that pip is
+available if virtualenv is enabled.
+:param packages: string for single package or a list of string for 
multiple packages
+:param install_driver: whether to install packages in client
+"""
+if self._conf.get("spark.pyspark.virtualenv.enabled") != "true":
+raise RuntimeError("install_packages can only use called when "
+   "spark.pyspark.virtualenv.enabled set as 
true")
+if isinstance(packages, basestring):
+packages = [packages]
+# seems statusTracker.getExecutorInfos() will return driver + 
exeuctors, so -1 here.
+num_executors = 
len(self._jsc.sc().statusTracker().getExecutorInfos()) - 1
+dummyRDD = self.parallelize(range(num_executors), num_executors)
+
+def _run_pip(packages, iterator):
+import pip
+pip.main(["install"] + packages)
+
+# run it in the main thread. Will do it in a separated thread after
+# https://github.com/pypa/pip/issues/2553 is fixed
+if install_driver:
+_run_pip(packages, None)
+
+import functools
+dummyRDD.foreachPartition(functools.partial(_run_pip, packages))
--- End diff --

@zjffdu No its it not, hard -1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-21 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13599
  
I think this is important functionality, however the current PR will break 
in the event of executor restart and that isn't acceptable. I'm -1 on this 
until that issue is fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-01-21 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/13599
  
I want to be clear, there are ways to support this which don't break on 
executor restart which is why the hard block. Resilience is a first class 
concept for Spark and we can't abandon it to install packages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20297: [SPARK-23020][CORE] Fix races in launcher code, test.

2018-01-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20297
  
thanks, merging to master/2.3!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability suppo...

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18906#discussion_r162857996
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -602,6 +602,30 @@ def test_non_existed_udf(self):
 self.assertRaisesRegexp(AnalysisException, "Can not load class 
non_existed_udf",
 lambda: 
sqlContext.registerJavaFunction("udf1", "non_existed_udf"))
 
+def test_udf_no_nulls(self):
+from pyspark.sql.functions import udf
+plus_four = udf(lambda x: x + 4, IntegerType()).asNonNullable()
+df = self.spark.range(10)
+res = df.select(plus_four(df['id']).alias('plus_four'))
+self.assertFalse(plus_four.nullable)
+self.assertFalse(res.schema['plus_four'].nullable)
+self.assertEqual(res.agg({'plus_four': 'sum'}).collect()[0][0], 85)
+
+def test_udf_with_callable_no_nulls(self):
+df = self.spark.range(10)
+
+class PlusFour:
+def __call__(self, col):
+if col is not None:
+return col + 4
--- End diff --

We need `else` clause to be non-nullable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability suppo...

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18906#discussion_r162857718
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2231,6 +2239,16 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
 ... return pd.Series(np.random.randn(len(v))
 >>> random = random.asNondeterministic()  # doctest: +SKIP
 
+.. note:: The user-defined functions are considered to be able to 
return null values by default.
+If your function is not deterministic, call `asNonNullable` on the 
user defined function.
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18906: [SPARK-21692][PYSPARK][SQL] Add nullability suppo...

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18906#discussion_r162857704
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2103,6 +2103,14 @@ def udf(f=None, returnType=StringType()):
 >>> import random
 >>> random_udf = udf(lambda: int(random.random() * 100), 
IntegerType()).asNondeterministic()
 
+.. note:: The user-defined functions are considered to be able to 
return null values by default.
+If your function is not deterministic, call `asNonNullable` on the 
user defined function.
--- End diff --

`nullable` instead of `deterministic`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCD...

2018-01-21 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20343#discussion_r162860188
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala ---
@@ -339,6 +340,30 @@ class TPCDSQuerySuite extends BenchmarkQueryTest {
 }
   }
 
+  val tpcdsQueriesV2_7_0 = Seq(
+"q1", "q2", "q3", "q4", "q5", "q5a", "q6", "q7", "q8", "q9", "q10", 
"q10a", "q11",
+"q12", "q13", "q14_1", "q14_2", "q14a_1", "q14a_2",  "q15", "q16", 
"q17", "q18", "q18a", "q19",
+"q20", "q21", "q22", "q22a", "q23_1", "q23_2", "q24_1", "q24_2", 
"q25", "q26", "q27", "q27a",
+"q28", "q29", "q30", "q31", "q32", "q33", "q34", "q35", "q35a", "q36", 
"q36a", "q37", "q38",
+"q39_1", "q39_2", "q40", "q41", "q42", "q43", "q44", "q45", "q46", 
"q47", "q48", "q49",
+"q50", "q51", "q51a", "q52", "q53", "q54", "q55", "q56", "q57", "q58", 
"q59",
+"q60", "q61", "q62", "q63", "q64", "q65", "q66", "q67", "q67a", "q68", 
"q69",
+"q70", "q70a", "q71", "q72", "q73", "q74", "q75", "q76", "q77", 
"q77a", "q78", "q79",
+"q80", "q80a", "q81", "q82", "q83", "q84", "q85", "q86", "q86a", 
"q87", "q88", "q89",
+"q90", "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
+
+  tpcdsQueriesV2_7_0.foreach { name =>
+val queryString = resourceToString(s"tpcds-v2.7.0/$name.sql",
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20053: [SPARK-22873] [CORE] Init lastReportTimestamp wit...

2018-01-21 Thread Ngone51

Github user Ngone51 closed the pull request at:

https://github.com/apache/spark/pull/20053


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20056: [SPARK-22878] [CORE] Count totalDroppedEvents for...

2018-01-21 Thread Ngone51

Github user Ngone51 closed the pull request at:

https://github.com/apache/spark/pull/20056


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19285
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20277
  
**[Test build #86459 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86459/testReport)**
 for PR 20277 at commit 
[`55a288e`](https://github.com/apache/spark/commit/55a288e925a71cd48a533d6171926e398f857c2e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20146
  
**[Test build #86460 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86460/testReport)**
 for PR 20146 at commit 
[`540c364`](https://github.com/apache/spark/commit/540c364d2a70ecd6ee5b92fadedc5e9b85026d2c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162860736
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/VectorizedHashMapGenerator.scala
 ---
@@ -127,8 +127,10 @@ class VectorizedHashMapGenerator(
 
 def genEqualsForKeys(groupingKeys: Seq[Buffer]): String = {
   groupingKeys.zipWithIndex.map { case (key: Buffer, ordinal: Int) =>
-s"""(${ctx.genEqual(key.dataType, 
ctx.getValue(s"vectors[$ordinal]", "buckets[idx]",
-  key.dataType), key.name)})"""
+// `ColumnVector.getStruct` is different from 
`InternalRow.getStruct`, it only takes an
+// `ordinal` parameter.
+val value = ctx.getValue(s"vectors[$ordinal]", key.dataType, 
"buckets[idx]")
--- End diff --

`getValueFromVector` instead of `getValue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20343: [SPARK-23167][SQL] Add TPCDS queries v2.7 in TPCDSQueryS...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20343
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/89/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UD...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20348
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/88/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs for UD...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20348
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20277: [SPARK-23090][SQL] polish ColumnVector

2018-01-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20277#discussion_r162862761
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java 
---
@@ -33,18 +33,6 @@
   private final ArrowVectorAccessor accessor;
   private ArrowColumnVector[] childColumns;
 
-  private void ensureAccessible(int index) {
-ensureAccessible(index, 1);
-  }
-
-  private void ensureAccessible(int index, int count) {
--- End diff --

How about we do it later? We need to find a central place to put this 
check, instead of doing it in every implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20338: [SPARK-23174][BUILD][PYTHON] python code style checker u...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20338
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86454/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20338: [SPARK-23174][BUILD][PYTHON] python code style checker u...

2018-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20338
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20348: [SPARK-23122][PYSPARK][FOLLOW-UP] Update the docs...

2018-01-21 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20348#discussion_r162856477
  
--- Diff: python/pyspark/sql/udf.py ---
@@ -200,7 +200,7 @@ def __init__(self, sparkSession):
 @since("1.3.1")
 def register(self, name, f, returnType=None):
 """Registers a Python function (including lambda function) or a 
user-defined function
--- End diff --

`Register` instead of `Registers` to be consistent with other descriptions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo