date:20180224

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20670
  
**[Test build #87648 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)**
 for PR 20670 at commit 
[`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87648/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170444340
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
 alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
   }
 }
+
+private[client] class Shim_v2_2 extends Shim_v2_1 {
+
+}
+
+private[client] class Shim_v2_3 extends Shim_v2_2 {
+
+  val environmentContext = new EnvironmentContext()
+  environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true")
+
+  private lazy val alterPartitionsMethod =
+findMethod(
+  classOf[Hive],
+  "alterPartitions",
+  classOf[String],
+  classOf[JList[Partition]],
+  classOf[EnvironmentContext])
+
+  override def alterPartitions(hive: Hive, tableName: String, newParts: 
JList[Partition]): Unit = {
--- End diff --

If we do not add `alterPartitionsMethod `, which test case will fail? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: add constranits

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20670
  
**[Test build #87648 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)**
 for PR 20670 at commit 
[`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: add constranits

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: add constranits

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1035/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20670: add constranits

2018-02-24 Thread KaiXinXiaoLei

GitHub user KaiXinXiaoLei opened a pull request:

https://github.com/apache/spark/pull/20670

add constranits

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)
I run a sql: `select ls.cs_order_number from ls left semi join 
catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is 
a small table ,and the number is one. The `catalog_sales` table is a big table, 
 and the number is 10 billion. The task will be hang up. And i find the many 
null values of `cs_order_number` in the `catalog_sales` table. I think the null 
value should be removed in the logical plan.

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)


Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KaiXinXiaoLei/spark Spark-23405

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20670


commit 705ed462bb307871e65199ce02576f12d60d2176
Author: KaiXinXiaoLei <584620569@...>
Date:   2018-02-25T06:06:39Z

add constranits




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-24 Thread drboyer

Github user drboyer commented on the issue:

https://github.com/apache/spark/pull/20658
  
@HyukjinKwon thanks for the review so far! Sorry for the delay, I somehow 
missed the Python style output in the test logs earlier. How's this look now?

Can you elaborate more on "doctest" if it's still needed? From what I can 
tell the only documentation for the Catalog is [this simple 
reference](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=catalog#pyspark.sql.SparkSession.catalog)
 which would be unaffected by my change


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20658
  
**[Test build #87647 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87647/testReport)**
 for PR 20658 at commit 
[`a49ffa0`](https://github.com/apache/spark/commit/a49ffa010a46a0d87de124d8ddf66c8173b756fb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87647/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20658
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20658: [SPARK-23488][python] Add missing catalog methods to pyt...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20658
  
**[Test build #87647 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87647/testReport)**
 for PR 20658 at commit 
[`a49ffa0`](https://github.com/apache/spark/commit/a49ffa010a46a0d87de124d8ddf66c8173b756fb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20668
  
Also need to update `HiveClientVersions.scala`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170441161
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -125,7 +126,7 @@ class VersionsSuite extends SparkFunSuite with Logging {
   // Hive changed the default of datanucleus.schema.autoCreateAll from 
true to false and
   // hive.metastore.schema.verification from false to true since 2.0
   // For details, see the JIRA HIVE-6113 and HIVE-12463
-  if (version == "2.0" || version == "2.1") {
+  if (version.split("\\.").head.toInt > 1) {
--- End diff --

```Scala
if (version == "2.0" || version == "2.1" || version == "2.2" || version == 
"2.3") {
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170440954
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
 alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
   }
 }
+
+private[client] class Shim_v2_2 extends Shim_v2_1 {
+
+}
--- End diff --

Please remove `{}`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1028/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1028/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1027/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1034/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread ssuchter

Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20669
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1027/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1033/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1026/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread ssuchter

Github user ssuchter commented on the issue:

https://github.com/apache/spark/pull/20669
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1032/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20669
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/1026/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20669: [SPARK-22839][K8S] Remove the use of init-container for ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20669
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20669: [SPARK-22839][K8S] Remove the use of init-contain...

2018-02-24 Thread ifilonenko

GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/20669

[SPARK-22839][K8S] Remove the use of init-container for downloading remote 
dependencies

## What changes were proposed in this pull request?

Removal of the init-container for downloading remote dependencies. Built 
off of the work done by @vanzin in an attempt to refactor driver/executor 
configuration elaborated in 
[this](https://issues.apache.org/jira/browse/SPARK-22839) ticket. 

## How was this patch tested?

This patch was tested with unit and integration tests. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark remove-init-container

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20669.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20669


commit 2fefd0edf2f15ba66620fd507bd0cd7ce01bcd1e
Author: Ilan Filonenko 
Date:   2018-02-24T23:25:45Z

Removed the use of init-container for downloading remote dependencies




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18894: [SPARK-21673] Use the correct sandbox environment variab...

2018-02-24 Thread joerg84

Github user joerg84 commented on the issue:

https://github.com/apache/spark/pull/18894
  
LGTM from a Mesos perspective



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20641: [SPARK-23464][MESOS] Fix mesos cluster scheduler options...

2018-02-24 Thread susanxhuynh

Github user susanxhuynh commented on the issue:

https://github.com/apache/spark/pull/20641
  
Thanks for the PR! It seems that the previous attempt to fix this 
(SPARK-18114) was wrong -- I'm not sure why we didn't catch the problem before, 
maybe lack of testing? @krcz My suggestion for this patch is to add a test, in 
order to prevent another regression in the future. I've written a unit test for 
this -- you could do something similar:  
https://github.com/mesosphere/spark/commit/4812ba3d10264f6d22ec654fa16b5810d70c27a9
 I will also do more testing with my own integration tests. cc @skonto 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87646/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20668
  
**[Test build #87646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87646/testReport)**
 for PR 20668 at commit 
[`48343bc`](https://github.com/apache/spark/commit/48343bc8214468b58dcffcc8d968c870ee0189be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87645/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20668
  
**[Test build #87645 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87645/testReport)**
 for PR 20668 at commit 
[`5b1fc01`](https://github.com/apache/spark/commit/5b1fc0145efbdd427e8b49bd0f840f709d4bc801).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170425193
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -209,13 +209,15 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted \
- record, and puts the malformed string into a field 
configured by \
- ``columnNameOfCorruptRecord``. To keep corrupt records, 
an user can set \
- a string type field named ``columnNameOfCorruptRecord`` 
in an user-defined \
- schema. If a schema does not have the field, it drops 
corrupt records during \
- parsing. When inferring a schema, it implicitly adds a \
- ``columnNameOfCorruptRecord`` field in an output schema.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts 
the malformed string \
+  into a field configured by 
``columnNameOfCorruptRecord``, and sets other \
+  fields to ``null``. To keep corrupt records, an user can 
set a string type \
+  field named ``columnNameOfCorruptRecord`` in an 
user-defined schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+  When inferring a schema, it implicitly adds a 
``columnNameOfCorruptRecord`` \
--- End diff --

I think we should say `it implicitly adds ... if a corrupted record is 
found ` while we are here? I think it only adds `` `columnNameOfCorruptRecord` 
`` when it meets a corrupted record during schema inference.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170425254
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -209,13 +209,15 @@ def json(self, path, schema=None, 
primitivesAsString=None, prefersDecimal=None,
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted \
- record, and puts the malformed string into a field 
configured by \
- ``columnNameOfCorruptRecord``. To keep corrupt records, 
an user can set \
- a string type field named ``columnNameOfCorruptRecord`` 
in an user-defined \
- schema. If a schema does not have the field, it drops 
corrupt records during \
- parsing. When inferring a schema, it implicitly adds a \
- ``columnNameOfCorruptRecord`` field in an output schema.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts 
the malformed string \
+  into a field configured by 
``columnNameOfCorruptRecord``, and sets other \
+  fields to ``null``. To keep corrupt records, an user can 
set a string type \
+  field named ``columnNameOfCorruptRecord`` in an 
user-defined schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+  When inferring a schema, it implicitly adds a 
``columnNameOfCorruptRecord`` \
+  field in an output schema. It doesn't support partial 
results. Even just one \
--- End diff --

It's trivial but how about we avoid an abbreviation like `dosen't`? It's 
usually what I do for doc although I am not sure if it actually matters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170425099
  
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -393,13 +395,16 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
 :param mode: allows a mode for dealing with corrupt records during 
parsing. If None is
  set, it uses the default value, ``PERMISSIVE``.
 
-* ``PERMISSIVE`` : sets other fields to ``null`` when it 
meets a corrupted \
-  record, and puts the malformed string into a field 
configured by \
-  ``columnNameOfCorruptRecord``. To keep corrupt records, 
an user can set \
-  a string type field named ``columnNameOfCorruptRecord`` 
in an \
-  user-defined schema. If a schema does not have the 
field, it drops corrupt \
-  records during parsing. When a length of parsed CSV 
tokens is shorter than \
-  an expected length of a schema, it sets `null` for extra 
fields.
+* ``PERMISSIVE`` : when it meets a corrupted record, puts 
the malformed string \
+  into a field configured by 
``columnNameOfCorruptRecord``, and sets other \
+  fields to ``null``. To keep corrupt records, an user can 
set a string type \
+  field named ``columnNameOfCorruptRecord`` in an 
user-defined schema. If a \
+  schema does not have the field, it drops corrupt records 
during parsing. \
+  It supports partial result for the records just with 
less or more tokens \
+  than the schema. When it meets a malformed record whose 
parsed tokens is \
--- End diff --

How about ` a malformed record whose parsed tokens is` -> ` a malformed 
record having the length of parsed tokens shorter than the length of a schema`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1031/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20668
  
**[Test build #87646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87646/testReport)**
 for PR 20668 at commit 
[`48343bc`](https://github.com/apache/spark/commit/48343bc8214468b58dcffcc8d968c870ee0189be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425667
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -202,7 +202,6 @@ private[spark] object HiveUtils extends Logging {
   ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> 
TimeUnit.MILLISECONDS,
   ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS,
   ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> 
TimeUnit.MILLISECONDS,
-  ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS,
--- End diff --

Remove `HIVE_STATS_JDBC_TIMEOUT ` , 
more see: https://issues.apache.org/jira/browse/HIVE-12164


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425631
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -202,8 +202,6 @@ private[spark] object HiveUtils extends Logging {
   ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> 
TimeUnit.MILLISECONDS,
   ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS,
   ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> 
TimeUnit.MILLISECONDS,
-  ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS,
-  ConfVars.HIVE_STATS_RETRIES_WAIT -> TimeUnit.MILLISECONDS,
--- End diff --

Remove `HIVE_STATS_JDBC_TIMEOUT ` , 
more see: https://issues.apache.org/jira/browse/HIVE-12164


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425408
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
 alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
   }
 }
+
+private[client] class Shim_v2_2 extends Shim_v2_1 {
+
+}
+
+private[client] class Shim_v2_3 extends Shim_v2_2 {
+
+  val environmentContext = new EnvironmentContext()
+  environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true")
--- End diff --

Otherwise will throw `NumberFormatException`:
```
[info] Cause: java.lang.NumberFormatException: null
[info] at java.lang.Long.parseLong(Long.java:552)
[info] at java.lang.Long.parseLong(Long.java:631)
[info] at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.isFastStatsSame(MetaStoreUtils.java:315)
[info] at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:605)
[info] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3837)
```
more see: https://issues.apache.org/jira/browse/HIVE-15653


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20668
  
**[Test build #87645 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87645/testReport)**
 for PR 20668 at commit 
[`5b1fc01`](https://github.com/apache/spark/commit/5b1fc0145efbdd427e8b49bd0f840f709d4bc801).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1030/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20668
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum

GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20668

[SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metastore

## What changes were proposed in this pull request?

Support Hive 2.2 and Hive 2.3 metastore.

## How was this patch tested?

Exist tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23510

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20668


commit 5b1fc0145efbdd427e8b49bd0f840f709d4bc801
Author: Yuming Wang 
Date:   2018-02-24T16:19:35Z

Support Hive 2.2 and Hive 2.3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

2018-02-24 Thread Ngone51

Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170424196
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, 
BlockManagerId](true)
 
-  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
+  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean 
= false): BlockManagerId =
+  {
 blockManagerIdCache.putIfAbsent(id, id)
-blockManagerIdCache.get(id)
+val blockManagerId = blockManagerIdCache.get(id)
+if (clearOldValues) {
+  blockManagerIdCache.clearOldValues(System.currentTimeMillis - 
Utils.timeStringAsMs("10d"))
--- End diff --

10 days? I don't think *time* can be a judging criteria to decide whether 
we should remove a cached id or not, even if you set the time threshold far 
less/greater than '10d'. Think about a extreamly case that a block could be 
frequently got all the time during the appâs running. So, it would be 
certainly removed from cache due to the time threshold, and recached next time 
we get it, and repeatedly.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87644/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87644 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87644/testReport)**
 for PR 20666 at commit 
[`4400cf2`](https://github.com/apache/spark/commit/4400cf2eb4d3b1b37c9e299e91db6e4a032e0c3a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87643/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87643/testReport)**
 for PR 20647 at commit 
[`a73370a`](https://github.com/apache/spark/commit/a73370a5bf56f45ce67cd6cdaf86b53a14a67b5b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87644/testReport)**
 for PR 20666 at commit 
[`4400cf2`](https://github.com/apache/spark/commit/4400cf2eb4d3b1b37c9e299e91db6e4a032e0c3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1029/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

2018-02-24 Thread caneGuy

GitHub user caneGuy opened a pull request:

https://github.com/apache/spark/pull/20667

[SPARK-23508][CORE] Use timeStampedHashMap for BlockmanagerId in case 
blockManagerIdCacheâ¦

â¦ cause oom

## What changes were proposed in this pull request?
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom

`val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()`
Since whenever we apply a new BlockManagerId, it will put into this map.

This patch will use timestampHashMap instead for `JsonProtocol`.

## How was this patch tested?
Exist tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/caneGuy/spark zhoukang/fix-history

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20667


commit fc1b6a0169c123a825a253defb021c73aebf1c98
Author: zhoukang 
Date:   2018-02-24T10:13:01Z

Use timeStampedHashMap for BlockmanagerId in case blockManagerIdCache cause 
oom




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-24 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170418454
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -550,12 +552,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `mode` (default `PERMISSIVE`): allows a mode for dealing with 
corrupt records
*during parsing. It supports the following case-insensitive modes.
*   
-   * `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record, and puts
-   * the malformed string into a field configured by 
`columnNameOfCorruptRecord`. To keep
+   * `PERMISSIVE` : when it meets a corrupted record, puts the 
malformed string into a
+   * field configured by `columnNameOfCorruptRecord`, and sets other 
fields to `null`. To keep
* corrupt records, an user can set a string type field named 
`columnNameOfCorruptRecord`
* in an user-defined schema. If a schema does not have the field, 
it drops corrupt records
-   * during parsing. When a length of parsed CSV tokens is shorter 
than an expected length
-   * of a schema, it sets `null` for extra fields.
+   * during parsing. It supports partial result for the records just 
with less or more tokens
--- End diff --

Yes. Will update accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87642/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87642/testReport)**
 for PR 20666 at commit 
[`4ad330b`](https://github.com/apache/spark/commit/4ad330b1def558e17dfb693d428e1bd69248e5a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

2018-02-24 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20666#discussion_r170417628
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -550,12 +552,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* `mode` (default `PERMISSIVE`): allows a mode for dealing with 
corrupt records
*during parsing. It supports the following case-insensitive modes.
*   
-   * `PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record, and puts
-   * the malformed string into a field configured by 
`columnNameOfCorruptRecord`. To keep
+   * `PERMISSIVE` : when it meets a corrupted record, puts the 
malformed string into a
+   * field configured by `columnNameOfCorruptRecord`, and sets other 
fields to `null`. To keep
* corrupt records, an user can set a string type field named 
`columnNameOfCorruptRecord`
* in an user-defined schema. If a schema does not have the field, 
it drops corrupt records
-   * during parsing. When a length of parsed CSV tokens is shorter 
than an expected length
-   * of a schema, it sets `null` for extra fields.
+   * during parsing. It supports partial result for the records just 
with less or more tokens
--- End diff --

I think there are same instances to update `DataStreamReader`, 
`readwriter.py` and `streaming.py` too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20597: [MINOR][TEST] Update from 2.2.0 to 2.2.1 in HiveE...

2018-02-24 Thread seancxmao

Github user seancxmao closed the pull request at:

https://github.com/apache/spark/pull/20597


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1028/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87643/testReport)**
 for PR 20647 at commit 
[`a73370a`](https://github.com/apache/spark/commit/a73370a5bf56f45ce67cd6cdaf86b53a14a67b5b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87642/testReport)**
 for PR 20666 at commit 
[`4ad330b`](https://github.com/apache/spark/commit/4ad330b1def558e17dfb693d428e1bd69248e5a3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1027/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20647
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20666
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #10942: [SPARK-12850] [SQL] Support Bucket Pruning (Predicate Pu...

2018-02-24 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/10942
  
@lonehacker I have just created a jira ticket for the migration project:
https://issues.apache.org/jira/browse/SPARK-23507


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87641/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87641/testReport)**
 for PR 20666 at commit 
[`4ad330b`](https://github.com/apache/spark/commit/4ad330b1def558e17dfb693d428e1bd69248e5a3).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87640/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-24 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87640/testReport)**
 for PR 20647 at commit 
[`a73370a`](https://github.com/apache/spark/commit/a73370a5bf56f45ce67cd6cdaf86b53a14a67b5b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

87 matches

Mail list logo