[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23218
  
**[Test build #99679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/testReport)**
 for PR 23218 at commit 
[`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99671/testReport)**
 for PR 23215 at commit 
[`4719765`](https://github.com/apache/spark/commit/4719765bace94f8cda2690ec042ca7079e88443b).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99675/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22583
  
**[Test build #99675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99675/testReport)**
 for PR 22583 at commit 
[`d8f26d9`](https://github.com/apache/spark/commit/d8f26d9761be5c0649f6dcf67504e366a30e4e0f).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23213#discussion_r238804336
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
   }
 }
   }
+
+  private def checkKeywordsExistsInExplain(df: DataFrame, keywords: 
String*): Unit = {
+val output = new java.io.ByteArrayOutputStream()
+Console.withOut(output) {
+  df.explain(extended = true)
+}
+val normalizedOutput = output.toString.replaceAll("#\\d+", "#x")
+for (key <- keywords) {
+  assert(normalizedOutput.contains(key))
+}
+  }
+
+  test("optimized plan should show the rewritten aggregate expression") {
--- End diff --

+1 for @viirya 's comment. We need to update the title and description of 
PR and JIRA.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23213#discussion_r238803747
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala 
---
@@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with 
SharedSQLContext {
 checkKeywordsExistsInExplain(df,
   keywords = "InMemoryRelation", "StorageLevel(disk, memory, 
deserialized, 1 replicas)")
   }
+
+  test("optimized plan should show the rewritten aggregate expression") {
+withTempView("test_agg") {
+  sql(
+"""
+  |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES
+  |  (1, true), (1, false),
+  |  (2, true),
+  |  (3, false), (3, null),
+  |  (4, null), (4, null),
+  |  (5, null), (5, true), (5, false) AS test_agg(k, v)
+""".stripMargin)
+
+  // simple explain of queries having every/some/any aggregates. 
Optimized
+  // plan should show the rewritten aggregate expression.
+  val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg 
GROUP BY k")
+  checkKeywordsExistsInExplain(df,
+"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS 
some(v)#x, " +
--- End diff --

The other two failures fails with the same reason.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23213#discussion_r238803424
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala 
---
@@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with 
SharedSQLContext {
 checkKeywordsExistsInExplain(df,
   keywords = "InMemoryRelation", "StorageLevel(disk, memory, 
deserialized, 1 replicas)")
   }
+
+  test("optimized plan should show the rewritten aggregate expression") {
+withTempView("test_agg") {
+  sql(
+"""
+  |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES
+  |  (1, true), (1, false),
+  |  (2, true),
+  |  (3, false), (3, null),
+  |  (4, null), (4, null),
+  |  (5, null), (5, true), (5, false) AS test_agg(k, v)
+""".stripMargin)
+
+  // simple explain of queries having every/some/any aggregates. 
Optimized
+  // plan should show the rewritten aggregate expression.
+  val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg 
GROUP BY k")
+  checkKeywordsExistsInExplain(df,
+"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS 
some(v)#x, " +
--- End diff --

Since `extended=false` in [line 
33](https://github.com/apache/spark/pull/23213/files#diff-d61681740c66c4c0ea311a76c98f80adR33),
 the test suite only compares with Physical Plan. Maybe, did you change line 33 
in your codebase?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238798656
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
--- End diff --

"The Kafka delegation..."


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238798169
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
--- End diff --

"...when `blah` is set, Spark considers the following log in options, in 
order of preference:"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238799671
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
When `spark.kafka.bootstrap.servers`
+  set Spark looks for authentication information in the following order 
and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Kafka delegation token provider can be turned off by setting 
`spark.security.credentials.kafka.enabled` to `false` (default: `true`).
+
+  Spark can be configured to use the following authentication protocols to 
obtain token:
+  - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for 
authentication and SSL for encryption.
+  - **SSL**: It's leveraging a capability from SSL called 2-way 
authentication. The server authenticates
+clients through certificates. Please note 2-way authentication must be 
enabled on Kafka brokers.
+  - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos 
used for authentication but
+because there is no encryption it's only for testing purposes.
+
+  After obtaining delegation token successfully, Spark distributes it 
across nodes and renews it accordingly.
+  Delegation token uses `SCRAM` login module for authentication and 
because of that the appropriate
+  `sasl.mechanism` has to be configured on source/sink.
--- End diff --

Still not clear to me. Does this mean the user has to change their code so 
that when they, e.g., run a Kafka query, they have to say 
`.option("sasl.mechanism", "SCRAM")`?

This needs to tell the user exactly what to do to get this to work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23169
  
**[Test build #99683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99683/testReport)**
 for PR 23169 at commit 
[`1b692a0`](https://github.com/apache/spark/commit/1b692a0444a1c0f1fc24a08241f24dd35e4c428b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5736/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...

2018-12-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23088


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-04 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merging to master / 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #4450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4450/testReport)**
 for PR 22683 at commit 
[`57ecbf9`](https://github.com/apache/spark/commit/57ecbf964a9c2681d990f64cc00368190f288926).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22957
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23098
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99667/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23098
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22957
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5735/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22957
  
**[Test build #99682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99682/testReport)**
 for PR 22957 at commit 
[`bf1d04a`](https://github.com/apache/spark/commit/bf1d04a819855737d1096b61b1c3d46010f50dee).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23098
  
**[Test build #99667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99667/testReport)**
 for PR 23098 at commit 
[`96f9c41`](https://github.com/apache/spark/commit/96f9c419961df2f6be1c137bbcd5a2ff27225fad).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...

2018-12-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23210


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23210: [SPARK-26233][SQL] CheckOverflow when encoding a decimal...

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23210
  
@mgaido91 . This needs to land `branch-2.4/branch-2.3/branch-2.2`, but it 
fails at `branch-2.4` due to the conflicts in the test case file. Could you 
make separate backport PRs for each branch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99668/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23217
  
**[Test build #99668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99668/testReport)**
 for PR 23217 at commit 
[`724db5c`](https://github.com/apache/spark/commit/724db5cd752d2c79032a887e8ae2806d9a5acc65).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99681 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99681/testReport)**
 for PR 23215 at commit 
[`4060c30`](https://github.com/apache/spark/commit/4060c30be00f0026c5c8e7304244bab2b70537f9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238782502
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

When I was testing this again today I was getting an error from just doing 
a plain `get` because for some reason the code was taking the first branch even 
though `kubeContext` was `None` so I changed to using an explicit `isDefined` 
instead of negating `isEmpty` and added `getOrElse` with a fallback just to 
avoid that


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238780494
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -20,20 +20,22 @@ import java.io.File
 
 import com.google.common.base.Charsets
 import com.google.common.io.Files
+import io.fabric8.kubernetes.client.Config.autoConfigure
 import io.fabric8.kubernetes.client.{ConfigBuilder, 
DefaultKubernetesClient, KubernetesClient}
 import io.fabric8.kubernetes.client.utils.HttpClientUtils
 import okhttp3.Dispatcher
-
+import org.apache.commons.lang3.StringUtils
--- End diff --

Now unused.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238780744
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty)
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isDefined) s"context 
${kubeContext.getOrElse("?")}" else "current context"}" +
--- End diff --

I saw your commit about the NPE, but I'm not sure how you could get one 
here?

`sparkConf.get` will never return `Some(null)` as far as I know (it would 
return `None` instead).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...

2018-12-04 Thread rvesse
Github user rvesse commented on a diff in the pull request:

https://github.com/apache/spark/pull/22904#discussion_r238779965
  
--- Diff: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala
 ---
@@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory {
 val dispatcher = new Dispatcher(
   ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher"))
 
-// TODO [SPARK-25887] Create builder in a way that respects 
configurable context
-val config = new ConfigBuilder()
+// Allow for specifying a context used to auto-configure from the 
users K8S config file
+val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => 
StringUtils.isNotBlank(c))
+logInfo(s"Auto-configuring K8S client using " +
+  s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else 
"current context"}" +
+  s" from users K8S config file")
+
+// Start from an auto-configured config with the desired context
+// Fabric 8 uses null to indicate that the users current context 
should be used so if no
+// explicit setting pass null
+val config = new 
ConfigBuilder(autoConfigure(kubeContext.getOrElse(null)))
--- End diff --

Yes and I was referring to the K8S config file :) And yes the fact that we 
would propagate `spark.kubernetes.context` into the pod shouldn't be an issue 
because there won't be any K8S config file for it to interact with inside the 
pod as in-pod K8S config should be from the service account token that gets 
injected into the pod


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...

2018-12-04 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/23195#discussion_r238776692
  
--- Diff: docs/structured-streaming-kafka-integration.md ---
@@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use 
`--packages` to add `spark-
 
 See [Application Submission Guide](submitting-applications.html) for more 
details about submitting
 applications with external dependencies.
+
+## Security
+
+Kafka 0.9.0.0 introduced several features that increases security in a 
cluster. For detailed
+description about these possibilities, see [Kafka security 
docs](http://kafka.apache.org/documentation.html#security).
+
+It's worth noting that security is optional and turned off by default.
+
+Spark supports the following ways to authenticate against Kafka cluster:
+- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the 
application can be configured
+  via Spark parameters and may not need JAAS login configuration (Spark 
can use Kafka's dynamic JAAS
+  configuration feature). For further information about delegation tokens, 
see
+  [Kafka delegation token 
docs](http://kafka.apache.org/documentation/#security_delegation_token).
+
+  The process is initiated by Spark's Kafka delegation token provider. 
This is enabled by default
+  but can be turned off with `spark.security.credentials.kafka.enabled`. 
When
+  `spark.kafka.bootstrap.servers` set Spark looks for authentication 
information in the following
+  order and choose the first available to log in:
+  - **JAAS login configuration**
+  - **Keytab file**, such as,
+
+./bin/spark-submit \
+--keytab  \
+--principal  \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  - **Kerberos credential cache**, such as,
+
+./bin/spark-submit \
+--conf spark.kafka.bootstrap.servers= \
+...
+
+  Spark supports the following authentication protocols to obtain token:
--- End diff --

Keeping the list of supported protocols without the explanation, saying it 
must match the Kafka broker config, sounds better to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5733/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23206: [SPARK-26249][SQL] Add ability to inject a rule i...

2018-12-04 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23206#discussion_r238776051
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -235,10 +235,127 @@ abstract class Optimizer(sessionCatalog: 
SessionCatalog)
*/
   def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil
 
+  /**
+   * Seq of Optimizer rule to be added after or before a rule in a 
specific batch
+   */
+  def optimizerRulesInOrder: Seq[RuleInOrder] = Nil
+
+  /**
+   * Batches to add to the optimizer in a specific order with respect to a 
existing batch
+   * Seq of Tuple(existing batch name, order, Batch to add).
+   */
+  def optimizerBatches: Seq[(String, Order.Value, Batch)] = Nil
+
+  /**
+   * Return the batch after removing rules that need to be excluded
+   */
+  private def handleExcludedRules(batch: Batch, excludedRules: 
Seq[String]): Seq[Batch] = {
+// Excluded rules
+val filteredRules = batch.rules.filter { rule =>
+  val exclude = excludedRules.contains(rule.ruleName)
+  if (exclude) {
+logInfo(s"Optimization rule '${rule.ruleName}' is excluded from 
the optimizer.")
+  }
+  !exclude
+}
+if (batch.rules == filteredRules) {
+  Seq(batch)
+} else if (filteredRules.nonEmpty) {
+  Seq(Batch(batch.name, batch.strategy, filteredRules: _*))
+} else {
+  logInfo(s"Optimization batch '${batch.name}' is excluded from the 
optimizer " +
+s"as all enclosed rules have been excluded.")
+  Seq.empty
+}
+  }
+
+  /**
+   * Add the customized rules and batch in order to the optimizer batches.
+   * excludedRules - rules that will be excluded
--- End diff --

nit: `* @param excludedRules ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23217
  
**[Test build #99680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99680/testReport)**
 for PR 23217 at commit 
[`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/23217
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99670/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...

2018-12-04 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/22911#discussion_r238770191
  
--- Diff: 
resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStepSuite.scala
 ---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.features
+
+import java.io.File
+import java.nio.charset.StandardCharsets.UTF_8
+import java.security.PrivilegedExceptionAction
+
+import scala.collection.JavaConverters._
+
+import com.google.common.io.Files
+import io.fabric8.kubernetes.api.model.{ConfigMap, Secret}
+import org.apache.commons.codec.binary.Base64
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.security.{Credentials, UserGroupInformation}
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.deploy.k8s._
+import org.apache.spark.deploy.k8s.Config._
+import org.apache.spark.deploy.k8s.Constants._
+import org.apache.spark.deploy.k8s.submit.JavaMainAppResource
+import org.apache.spark.internal.config._
+import org.apache.spark.util.Utils
+
+class KerberosConfDriverFeatureStepSuite extends SparkFunSuite {
+
+  import KubernetesFeaturesTestUtils._
+  import SecretVolumeUtils._
+
+  private val tmpDir = Utils.createTempDir()
+
+  test("mount krb5 config map if defined") {
+val configMap = "testConfigMap"
+val step = createStep(
+  new SparkConf(false).set(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP, 
configMap))
+
+checkPodForKrbConf(step.configurePod(SparkPod.initialPod()), configMap)
+assert(step.getAdditionalPodSystemProperties().isEmpty)
+
assert(filter[ConfigMap](step.getAdditionalKubernetesResources()).isEmpty)
+  }
+
+  test("create krb5.conf config map if local config provided") {
+val krbConf = File.createTempFile("krb5", ".conf", tmpDir)
+Files.write("some data", krbConf, UTF_8)
+
+val sparkConf = new SparkConf(false)
+  .set(KUBERNETES_KERBEROS_KRB5_FILE, krbConf.getAbsolutePath())
+val step = createStep(sparkConf)
+
+val confMap = 
filter[ConfigMap](step.getAdditionalKubernetesResources()).head
+assert(confMap.getData().keySet().asScala === Set(krbConf.getName()))
+
+checkPodForKrbConf(step.configurePod(SparkPod.initialPod()), 
confMap.getMetadata().getName())
+assert(step.getAdditionalPodSystemProperties().isEmpty)
+  }
+
+  test("create keytab secret if client keytab file used") {
+val keytab = File.createTempFile("keytab", ".bin", tmpDir)
+Files.write("some data", keytab, UTF_8)
+
+val sparkConf = new SparkConf(false)
+  .set(KEYTAB, keytab.getAbsolutePath())
+  .set(PRINCIPAL, "alice")
+val step = createStep(sparkConf)
+
+val pod = step.configurePod(SparkPod.initialPod())
+assert(podHasVolume(pod.pod, KERBEROS_KEYTAB_VOLUME))
+assert(containerHasVolume(pod.container, KERBEROS_KEYTAB_VOLUME, 
KERBEROS_KEYTAB_MOUNT_POINT))
+
+assert(step.getAdditionalPodSystemProperties().keys === 
Set(KEYTAB.key))
+
+val secret = 
filter[Secret](step.getAdditionalKubernetesResources()).head
+assert(secret.getData().keySet().asScala === Set(keytab.getName()))
+  }
+
+  test("do nothing if container-local keytab used") {
+val sparkConf = new SparkConf(false)
+  .set(KEYTAB, "local:/my.keytab")
+  .set(PRINCIPAL, "alice")
+val step = createStep(sparkConf)
+
+val initial = SparkPod.initialPod()
+assert(step.configurePod(initial) === initial)
+assert(step.getAdditionalPodSystemProperties().isEmpty)
+assert(step.getAdditionalKubernetesResources().isEmpty)
+  }
+
+  test("mount delegation tokens if provided") {
+val dtSecret = "tokenSecret"
+val sparkConf = new SparkConf(false)
+ 

[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23217
  
**[Test build #99670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99670/testReport)**
 for PR 23217 at commit 
[`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23218
  
**[Test build #99679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/testReport)**
 for PR 23218 at commit 
[`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23218
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5732/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23218
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-04 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23159
  
Rather than change every single call to this method, if this should 
generally be the value of the argument, then why not make it the default value 
or something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23218
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23218
  
Now, it's released. Let's try again.
- 
http://central.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.8/0.11/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23159
  
**[Test build #99678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99678/testReport)**
 for PR 23159 at commit 
[`b6fa959`](https://github.com/apache/spark/commit/b6fa95981970788a09657b0a29712b53c01831db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5731/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23159
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...

2018-12-04 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23159
  
@HyukjinKwon @dongjoon-hyun @srowen @zsxwing Do you have any objections of 
this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99669/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23219
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23219
  
**[Test build #99669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99669/testReport)**
 for PR 23219 at commit 
[`94f76e5`](https://github.com/apache/spark/commit/94f76e543c1b146d4d25d3e15b6efd4777af7652).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22514
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99665/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22514
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22514
  
**[Test build #99665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99665/testReport)**
 for PR 22514 at commit 
[`3c07d74`](https://github.com/apache/spark/commit/3c07d74ec94cf85f9aad79cde6c42ea667cb2090).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class OptimizedCreateHiveTableAsSelectCommand(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-12-04 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r238739905
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala ---
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.util.Try
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+
+private[spark] case class ProcfsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsMetricsGetter(val procfsDir: String = "/proc/") 
extends Logging {
+  val procfsStatFile = "stat"
+  val testing = sys.env.contains("SPARK_TESTING") || 
sys.props.contains("spark.testing")
+  val pageSize = computePageSize()
+  var isAvailable: Boolean = isProcfsAvailable
+  private val pid = computePid()
+
+  private lazy val isProcfsAvailable: Boolean = {
+if (testing) {
+   true
+}
+else {
+  val procDirExists = Try(Files.exists(Paths.get(procfsDir))).recover {
+case ioe: IOException =>
+  logWarning("Exception checking for procfs dir", ioe)
+  false
+  }
+  val shouldLogStageExecutorMetrics =
+SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS)
+  val shouldLogStageExecutorProcessTreeMetrics =
+SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS)
+  procDirExists.get && shouldLogStageExecutorProcessTreeMetrics && 
shouldLogStageExecutorMetrics
+}
+  }
+
+  private def computePid(): Int = {
+if (!isAvailable || testing) {
+  return -1;
+}
+try {
+  // This can be simplified in java9:
+  // 
https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html
+  val cmd = Array("bash", "-c", "echo $PPID")
+  val out = Utils.executeAndGetOutput(cmd)
+  val pid = Integer.parseInt(out.split("\n")(0))
+  return pid;
+}
+catch {
+  case e: SparkException =>
+logWarning("Exception when trying to compute process tree." +
+  " As a result reporting of ProcessTree metrics is stopped", e)
+isAvailable = false
+-1
+}
+  }
+
+  private def computePageSize(): Long = {
+if (testing) {
+  return 4096;
+}
+try {
+  val cmd = Array("getconf", "PAGESIZE")
+  val out = Utils.executeAndGetOutput(cmd)
+  Integer.parseInt(out.split("\n")(0))
+} catch {
+  case e: Exception =>
+logWarning("Exception when trying to compute pagesize, as a" +
+  " result reporting of ProcessTree metrics is stopped")
+isAvailable = false
+0
+}
+  }
+
+  private def computeProcessTree(): Set[Int] = {
+if (!isAvailable || testing) {
+  return Set()
+}
+var ptree: Set[Int] = Set()
+ptree += pid
+val queue = mutable.Queue.empty[Int]
+queue += pid
+while ( !queue.isEmpty ) {
+  val p = queue.dequeue()
+  val c = getChildPids(p)
+  if (!c.isEmpty) {
+queue ++= c
+ptree ++= c.toSet
+  }
+}
+ptree
+  }
+
+  private def getChildPids(pid: Int): ArrayBuffer[Int] = {
+try {
+  val builder = new ProcessBuilder("pgrep", "-P", pid.toString)
+  val process = b

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99677/testReport)**
 for PR 23207 at commit 
[`fcd62b3`](https://github.com/apache/spark/commit/fcd62b390ba4b5e2b1b9c6138026ac6da1b78d1f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5730/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23207
  
**[Test build #99676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99676/testReport)**
 for PR 23207 at commit 
[`ca6c407`](https://github.com/apache/spark/commit/ca6c407929e62492a2c5233504efaeaf731f8cc9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23207
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5729/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-04 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/23207#discussion_r238732441
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala ---
@@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask(
   threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime
 } else 0L
 
+// Register the shuffle write metrics reporter to shuffleWriteMetrics.
+if (dep.shuffleWriteMetricsReporter.isDefined) {
+  
context.taskMetrics().shuffleWriteMetrics.registerExternalShuffleWriteReporter(
--- End diff --

Cool! That's a more cleaner implementation on consistency for both read and 
write metrics reporter, also read metrics can extend 
`ShuffleReadMetricsReporter` directly. Done in ca6c407


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-12-04 Thread rezasafi
Github user rezasafi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22612#discussion_r238731915
  
--- Diff: 
core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala ---
@@ -0,0 +1,223 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.executor
+
+import java.io._
+import java.nio.charset.Charset
+import java.nio.file.{Files, Paths}
+import java.util.Locale
+
+import scala.collection.mutable
+import scala.collection.mutable.ArrayBuffer
+import scala.util.Try
+
+import org.apache.spark.{SparkEnv, SparkException}
+import org.apache.spark.internal.{config, Logging}
+import org.apache.spark.util.Utils
+
+
+private[spark] case class ProcfsMetrics(
+jvmVmemTotal: Long,
+jvmRSSTotal: Long,
+pythonVmemTotal: Long,
+pythonRSSTotal: Long,
+otherVmemTotal: Long,
+otherRSSTotal: Long)
+
+// Some of the ideas here are taken from the ProcfsBasedProcessTree class 
in hadoop
+// project.
+private[spark] class ProcfsMetricsGetter(val procfsDir: String = "/proc/") 
extends Logging {
--- End diff --

Oh ok. I didn't know that you don't need to mention val if you want a val 
parameter. Not related here but I think you can have a var parameter if you 
mention var in. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23216
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23216
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5726/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23216
  
**[Test build #99674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99674/testReport)**
 for PR 23216 at commit 
[`b3ede8b`](https://github.com/apache/spark/commit/b3ede8be1a9073f057cc46fb82eacd7fa3ec36c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22583
  
**[Test build #99675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99675/testReport)**
 for PR 22583 at commit 
[`d8f26d9`](https://github.com/apache/spark/commit/d8f26d9761be5c0649f6dcf67504e366a30e4e0f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5727/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5728/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...

2018-12-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23216
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22535
  
**[Test build #99673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99673/testReport)**
 for PR 22535 at commit 
[`c95706f`](https://github.com/apache/spark/commit/c95706f60e4d576caca78a32000d4a7bbb12c141).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99672/testReport)**
 for PR 23215 at commit 
[`272bb1d`](https://github.com/apache/spark/commit/272bb1da8317883c8256e0484738b029bea9f9bb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23219
  
@srowen Sorry.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/23219


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23219
  
@wangyum I already opened https://github.com/apache/spark/pull/23218


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23215
  
**[Test build #99671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99671/testReport)**
 for PR 23215 at commit 
[`4719765`](https://github.com/apache/spark/commit/4719765bace94f8cda2690ec042ca7079e88443b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r238707304
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,127 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  // Returns `DataWritingCommand` used to write data when the table exists.
+  def writingCommandForExistingTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
+  // Returns `DataWritingCommand` used to write data when the table 
doesn't exist.
+  def writingCommandForNewTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the table description, which may contain serde, 
storage handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def writingCommandForExistingTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+InsertIntoHiveTable(
+  tableDesc,
+  Map.empty,
+  query,
+  overwrite = false,
+  ifPartitionNotExists = false,
+  outputColumnNames = outputColumnNames)
+  }
+
+  override def writingCommandForNewTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+// For CTAS, there is no static partition values to insert.
+val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap
+InsertIntoHiveTable(
+  tableDesc,
+  partition,
+  query,
+  overwrite = true,
+  ifPartitionNotExists = false,
+  outputColumnNames = outputColumnNames)
+  }
+}
+
+/**
+ * Create table and insert the query result into it. This creates Hive 
table but inserts
+ * the query result into it by using data source.
+ *
+ * @param tableDesc the table description, which may contain serde, 
storage handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class OptimizedCreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  private def getHadoopRelation(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): HadoopFsRelation = {
+val metastoreCatalog = 
catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog
+val hiveTable = DDLUtils.readHiveTable(tableDesc)
+
+metastoreCatalog.convert(hiveTable) match {
+  case LogicalRelation(t: HadoopFsRelation, _, _, _) => t
+  case _ => throw new AnalysisException(s"$tableIdentifier should be 
converted to " +
+"HadoopFsRelation.")
+}
+  }
+
+  override def writingCommandForExistingTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+val hadoopRelation = getHadoopRelation(catalog, tableDesc)
+InsertIntoHadoopFsRelationCommand(
+  hadoopRelation.location.rootPaths.head,
+  Map.empty, // We don't support to convert partitioned table.
+  false,
+  Seq.empty, // We don't support to convert partitioned table.
+  hadoopRelation.bucketSpec,
+  hadoopRelation.fileFormat,
+  hadoopRelation.options,
+  query,
+  mode,
+  Some(tableDesc),
+  Some(hadoopRelation.location),
+  query.output.map(_.name))
+  }
+
+  override def writingCommandForNewTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+val hadoopRelation = getHadoopRelation(catalog, tableDesc)
+InsertIntoHadoopFsRelationCommand(
+  hadoopRelation.location.rootPaths.head,
+  Map.empty, // We don't support to convert partitioned table.
+  false,
+  Seq.empty, // We don't support to convert partitioned table.
+  hadoopRelation.bucketSpec,
+  hadoopRelation.fileFormat,
+  hadoopRelation.options,
+  query,
+  SaveMode.Overwrite,
--- End diff --

ok :)

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23215
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5725/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23213
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23213
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99663/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23213
  
**[Test build #99662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99662/testReport)**
 for PR 23213 at commit 
[`0305a05`](https://github.com/apache/spark/commit/0305a05ee53f45d6a7f922ab1b6a6a5ec44cc607).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23213
  
**[Test build #99663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99663/testReport)**
 for PR 23213 at commit 
[`57eec69`](https://github.com/apache/spark/commit/57eec69affadb91efec2b4c1f9086062433c86fa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23213
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99662/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23213
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22514#discussion_r238706120
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -95,9 +77,127 @@ case class CreateHiveTableAsSelectCommand(
 Seq.empty[Row]
   }
 
+  // Returns `DataWritingCommand` used to write data when the table exists.
+  def writingCommandForExistingTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
+  // Returns `DataWritingCommand` used to write data when the table 
doesn't exist.
+  def writingCommandForNewTable(
+catalog: SessionCatalog,
+tableDesc: CatalogTable): DataWritingCommand
+
   override def argString: String = {
 s"[Database:${tableDesc.database}, " +
 s"TableName: ${tableDesc.identifier.table}, " +
 s"InsertIntoHiveTable]"
   }
 }
+
+/**
+ * Create table and insert the query result into it.
+ *
+ * @param tableDesc the table description, which may contain serde, 
storage handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class CreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  override def writingCommandForExistingTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+InsertIntoHiveTable(
+  tableDesc,
+  Map.empty,
+  query,
+  overwrite = false,
+  ifPartitionNotExists = false,
+  outputColumnNames = outputColumnNames)
+  }
+
+  override def writingCommandForNewTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+// For CTAS, there is no static partition values to insert.
+val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap
+InsertIntoHiveTable(
+  tableDesc,
+  partition,
+  query,
+  overwrite = true,
+  ifPartitionNotExists = false,
+  outputColumnNames = outputColumnNames)
+  }
+}
+
+/**
+ * Create table and insert the query result into it. This creates Hive 
table but inserts
+ * the query result into it by using data source.
+ *
+ * @param tableDesc the table description, which may contain serde, 
storage handler etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param mode SaveMode
+ */
+case class OptimizedCreateHiveTableAsSelectCommand(
+tableDesc: CatalogTable,
+query: LogicalPlan,
+outputColumnNames: Seq[String],
+mode: SaveMode)
+  extends CreateHiveTableAsSelectBase {
+
+  private def getHadoopRelation(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): HadoopFsRelation = {
+val metastoreCatalog = 
catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog
+val hiveTable = DDLUtils.readHiveTable(tableDesc)
+
+metastoreCatalog.convert(hiveTable) match {
+  case LogicalRelation(t: HadoopFsRelation, _, _, _) => t
+  case _ => throw new AnalysisException(s"$tableIdentifier should be 
converted to " +
+"HadoopFsRelation.")
+}
+  }
+
+  override def writingCommandForExistingTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+val hadoopRelation = getHadoopRelation(catalog, tableDesc)
+InsertIntoHadoopFsRelationCommand(
+  hadoopRelation.location.rootPaths.head,
+  Map.empty, // We don't support to convert partitioned table.
+  false,
+  Seq.empty, // We don't support to convert partitioned table.
+  hadoopRelation.bucketSpec,
+  hadoopRelation.fileFormat,
+  hadoopRelation.options,
+  query,
+  mode,
+  Some(tableDesc),
+  Some(hadoopRelation.location),
+  query.output.map(_.name))
+  }
+
+  override def writingCommandForNewTable(
+  catalog: SessionCatalog,
+  tableDesc: CatalogTable): DataWritingCommand = {
+val hadoopRelation = getHadoopRelation(catalog, tableDesc)
+InsertIntoHadoopFsRelationCommand(
+  hadoopRelation.location.rootPaths.head,
+  Map.empty, // We don't support to convert partitioned table.
+  false,
+  Seq.empty, // We don't support to convert partitioned table.
+  hadoopRelation.bucketSpec,
+  hadoopRelation.fileFormat,
+  hadoopRelation.options,
+  query,
+  SaveMode.Overwrite,
--- End diff --

if 

[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23217
  
**[Test build #99670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99670/testReport)**
 for PR 23217 at commit 
[`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23219
  
**[Test build #99669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99669/testReport)**
 for PR 23219 at commit 
[`94f76e5`](https://github.com/apache/spark/commit/94f76e543c1b146d4d25d3e15b6efd4777af7652).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5724/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23217
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8

2018-12-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23219
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >