[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #99679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99671/testReport)** for PR 23215 at commit [`4719765`](https://github.com/apache/spark/commit/4719765bace94f8cda2690ec042ca7079e88443b). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22583 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22583 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99675/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22583 **[Test build #99675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99675/testReport)** for PR 22583 at commit [`d8f26d9`](https://github.com/apache/spark/commit/d8f26d9761be5c0649f6dcf67504e366a30e4e0f). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238804336 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- +1 for @viirya 's comment. We need to update the title and description of PR and JIRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238803747 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala --- @@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with SharedSQLContext { checkKeywordsExistsInExplain(df, keywords = "InMemoryRelation", "StorageLevel(disk, memory, deserialized, 1 replicas)") } + + test("optimized plan should show the rewritten aggregate expression") { +withTempView("test_agg") { + sql( +""" + |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES + | (1, true), (1, false), + | (2, true), + | (3, false), (3, null), + | (4, null), (4, null), + | (5, null), (5, true), (5, false) AS test_agg(k, v) +""".stripMargin) + + // simple explain of queries having every/some/any aggregates. Optimized + // plan should show the rewritten aggregate expression. + val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg GROUP BY k") + checkKeywordsExistsInExplain(df, +"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS some(v)#x, " + --- End diff -- The other two failures fails with the same reason. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238803424 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala --- @@ -53,6 +55,133 @@ class ExplainSuite extends QueryTest with SharedSQLContext { checkKeywordsExistsInExplain(df, keywords = "InMemoryRelation", "StorageLevel(disk, memory, deserialized, 1 replicas)") } + + test("optimized plan should show the rewritten aggregate expression") { +withTempView("test_agg") { + sql( +""" + |CREATE TEMPORARY VIEW test_agg AS SELECT * FROM VALUES + | (1, true), (1, false), + | (2, true), + | (3, false), (3, null), + | (4, null), (4, null), + | (5, null), (5, true), (5, false) AS test_agg(k, v) +""".stripMargin) + + // simple explain of queries having every/some/any aggregates. Optimized + // plan should show the rewritten aggregate expression. + val df = sql("SELECT k, every(v), some(v), any(v) FROM test_agg GROUP BY k") + checkKeywordsExistsInExplain(df, +"Aggregate [k#x], [k#x, min(v#x) AS every(v)#x, max(v#x) AS some(v)#x, " + --- End diff -- Since `extended=false` in [line 33](https://github.com/apache/spark/pull/23213/files#diff-d61681740c66c4c0ea311a76c98f80adR33), the test suite only compares with Physical Plan. Maybe, did you change line 33 in your codebase? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238798656 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). --- End diff -- "The Kafka delegation..." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238798169 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: --- End diff -- "...when `blah` is set, Spark considers the following log in options, in order of preference:" --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238799671 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,57 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. When `spark.kafka.bootstrap.servers` + set Spark looks for authentication information in the following order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Kafka delegation token provider can be turned off by setting `spark.security.credentials.kafka.enabled` to `false` (default: `true`). + + Spark can be configured to use the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark distributes it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication and because of that the appropriate + `sasl.mechanism` has to be configured on source/sink. --- End diff -- Still not clear to me. Does this mean the user has to change their code so that when they, e.g., run a Kafka query, they have to say `.option("sasl.mechanism", "SCRAM")`? This needs to tell the user exactly what to do to get this to work. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23169 **[Test build #99683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99683/testReport)** for PR 23169 at commit [`1b692a0`](https://github.com/apache/spark/commit/1b692a0444a1c0f1fc24a08241f24dd35e4c428b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23169 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5736/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23088: [SPARK-26119][CORE][WEBUI]Task summary table shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23088 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/23088 Merging to master / 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22683 **[Test build #4450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4450/testReport)** for PR 22683 at commit [`57ecbf9`](https://github.com/apache/spark/commit/57ecbf964a9c2681d990f64cc00368190f288926). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23098 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99667/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23098 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22957 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5735/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22957: [SPARK-25951][SQL] Ignore aliases for distributions and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22957 **[Test build #99682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99682/testReport)** for PR 22957 at commit [`bf1d04a`](https://github.com/apache/spark/commit/bf1d04a819855737d1096b61b1c3d46010f50dee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23098 **[Test build #99667 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99667/testReport)** for PR 23098 at commit [`96f9c41`](https://github.com/apache/spark/commit/96f9c419961df2f6be1c137bbcd5a2ff27225fad). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23210: [SPARK-26233][SQL] CheckOverflow when encoding a ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23210 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23210: [SPARK-26233][SQL] CheckOverflow when encoding a decimal...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23210 @mgaido91 . This needs to land `branch-2.4/branch-2.3/branch-2.2`, but it fails at `branch-2.4` due to the conflicts in the test case file. Could you make separate backport PRs for each branch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99668/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99668 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99668/testReport)** for PR 23217 at commit [`724db5c`](https://github.com/apache/spark/commit/724db5cd752d2c79032a887e8ae2806d9a5acc65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5734/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99681/testReport)** for PR 23215 at commit [`4060c30`](https://github.com/apache/spark/commit/4060c30be00f0026c5c8e7304244bab2b70537f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238782502 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isDefined) s"context ${kubeContext.getOrElse("?")}" else "current context"}" + --- End diff -- When I was testing this again today I was getting an error from just doing a plain `get` because for some reason the code was taking the first branch even though `kubeContext` was `None` so I changed to using an explicit `isDefined` instead of negating `isEmpty` and added `getOrElse` with a fallback just to avoid that --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238780494 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -20,20 +20,22 @@ import java.io.File import com.google.common.base.Charsets import com.google.common.io.Files +import io.fabric8.kubernetes.client.Config.autoConfigure import io.fabric8.kubernetes.client.{ConfigBuilder, DefaultKubernetesClient, KubernetesClient} import io.fabric8.kubernetes.client.utils.HttpClientUtils import okhttp3.Dispatcher - +import org.apache.commons.lang3.StringUtils --- End diff -- Now unused. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238780744 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(_.nonEmpty) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isDefined) s"context ${kubeContext.getOrElse("?")}" else "current context"}" + --- End diff -- I saw your commit about the NPE, but I'm not sure how you could get one here? `sparkConf.get` will never return `Some(null)` as far as I know (it would return `None` instead). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22904: [SPARK-25887][K8S] Configurable K8S context suppo...
Github user rvesse commented on a diff in the pull request: https://github.com/apache/spark/pull/22904#discussion_r238779965 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/SparkKubernetesClientFactory.scala --- @@ -67,8 +66,16 @@ private[spark] object SparkKubernetesClientFactory { val dispatcher = new Dispatcher( ThreadUtils.newDaemonCachedThreadPool("kubernetes-dispatcher")) -// TODO [SPARK-25887] Create builder in a way that respects configurable context -val config = new ConfigBuilder() +// Allow for specifying a context used to auto-configure from the users K8S config file +val kubeContext = sparkConf.get(KUBERNETES_CONTEXT).filter(c => StringUtils.isNotBlank(c)) +logInfo(s"Auto-configuring K8S client using " + + s"${if (kubeContext.isEmpty) s"context ${kubeContext.get}" else "current context"}" + + s" from users K8S config file") + +// Start from an auto-configured config with the desired context +// Fabric 8 uses null to indicate that the users current context should be used so if no +// explicit setting pass null +val config = new ConfigBuilder(autoConfigure(kubeContext.getOrElse(null))) --- End diff -- Yes and I was referring to the K8S config file :) And yes the fact that we would propagate `spark.kubernetes.context` into the pod shouldn't be an issue because there won't be any K8S config file for it to interact with inside the pod as in-pod K8S config should be from the service account token that gets injected into the pod --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238776692 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: --- End diff -- Keeping the list of supported protocols without the explanation, saying it must match the Kafka broker config, sounds better to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5733/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23206: [SPARK-26249][SQL] Add ability to inject a rule i...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/23206#discussion_r238776051 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -235,10 +235,127 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) */ def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil + /** + * Seq of Optimizer rule to be added after or before a rule in a specific batch + */ + def optimizerRulesInOrder: Seq[RuleInOrder] = Nil + + /** + * Batches to add to the optimizer in a specific order with respect to a existing batch + * Seq of Tuple(existing batch name, order, Batch to add). + */ + def optimizerBatches: Seq[(String, Order.Value, Batch)] = Nil + + /** + * Return the batch after removing rules that need to be excluded + */ + private def handleExcludedRules(batch: Batch, excludedRules: Seq[String]): Seq[Batch] = { +// Excluded rules +val filteredRules = batch.rules.filter { rule => + val exclude = excludedRules.contains(rule.ruleName) + if (exclude) { +logInfo(s"Optimization rule '${rule.ruleName}' is excluded from the optimizer.") + } + !exclude +} +if (batch.rules == filteredRules) { + Seq(batch) +} else if (filteredRules.nonEmpty) { + Seq(Batch(batch.name, batch.strategy, filteredRules: _*)) +} else { + logInfo(s"Optimization batch '${batch.name}' is excluded from the optimizer " + +s"as all enclosed rules have been excluded.") + Seq.empty +} + } + + /** + * Add the customized rules and batch in order to the optimizer batches. + * excludedRules - rules that will be excluded --- End diff -- nit: `* @param excludedRules ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99680/testReport)** for PR 23217 at commit [`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23217 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99670/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22911: [SPARK-25815][k8s] Support kerberos in client mod...
Github user ifilonenko commented on a diff in the pull request: https://github.com/apache/spark/pull/22911#discussion_r238770191 --- Diff: resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/KerberosConfDriverFeatureStepSuite.scala --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.deploy.k8s.features + +import java.io.File +import java.nio.charset.StandardCharsets.UTF_8 +import java.security.PrivilegedExceptionAction + +import scala.collection.JavaConverters._ + +import com.google.common.io.Files +import io.fabric8.kubernetes.api.model.{ConfigMap, Secret} +import org.apache.commons.codec.binary.Base64 +import org.apache.hadoop.io.Text +import org.apache.hadoop.security.{Credentials, UserGroupInformation} + +import org.apache.spark.{SparkConf, SparkFunSuite} +import org.apache.spark.deploy.SparkHadoopUtil +import org.apache.spark.deploy.k8s._ +import org.apache.spark.deploy.k8s.Config._ +import org.apache.spark.deploy.k8s.Constants._ +import org.apache.spark.deploy.k8s.submit.JavaMainAppResource +import org.apache.spark.internal.config._ +import org.apache.spark.util.Utils + +class KerberosConfDriverFeatureStepSuite extends SparkFunSuite { + + import KubernetesFeaturesTestUtils._ + import SecretVolumeUtils._ + + private val tmpDir = Utils.createTempDir() + + test("mount krb5 config map if defined") { +val configMap = "testConfigMap" +val step = createStep( + new SparkConf(false).set(KUBERNETES_KERBEROS_KRB5_CONFIG_MAP, configMap)) + +checkPodForKrbConf(step.configurePod(SparkPod.initialPod()), configMap) +assert(step.getAdditionalPodSystemProperties().isEmpty) + assert(filter[ConfigMap](step.getAdditionalKubernetesResources()).isEmpty) + } + + test("create krb5.conf config map if local config provided") { +val krbConf = File.createTempFile("krb5", ".conf", tmpDir) +Files.write("some data", krbConf, UTF_8) + +val sparkConf = new SparkConf(false) + .set(KUBERNETES_KERBEROS_KRB5_FILE, krbConf.getAbsolutePath()) +val step = createStep(sparkConf) + +val confMap = filter[ConfigMap](step.getAdditionalKubernetesResources()).head +assert(confMap.getData().keySet().asScala === Set(krbConf.getName())) + +checkPodForKrbConf(step.configurePod(SparkPod.initialPod()), confMap.getMetadata().getName()) +assert(step.getAdditionalPodSystemProperties().isEmpty) + } + + test("create keytab secret if client keytab file used") { +val keytab = File.createTempFile("keytab", ".bin", tmpDir) +Files.write("some data", keytab, UTF_8) + +val sparkConf = new SparkConf(false) + .set(KEYTAB, keytab.getAbsolutePath()) + .set(PRINCIPAL, "alice") +val step = createStep(sparkConf) + +val pod = step.configurePod(SparkPod.initialPod()) +assert(podHasVolume(pod.pod, KERBEROS_KEYTAB_VOLUME)) +assert(containerHasVolume(pod.container, KERBEROS_KEYTAB_VOLUME, KERBEROS_KEYTAB_MOUNT_POINT)) + +assert(step.getAdditionalPodSystemProperties().keys === Set(KEYTAB.key)) + +val secret = filter[Secret](step.getAdditionalKubernetesResources()).head +assert(secret.getData().keySet().asScala === Set(keytab.getName())) + } + + test("do nothing if container-local keytab used") { +val sparkConf = new SparkConf(false) + .set(KEYTAB, "local:/my.keytab") + .set(PRINCIPAL, "alice") +val step = createStep(sparkConf) + +val initial = SparkPod.initialPod() +assert(step.configurePod(initial) === initial) +assert(step.getAdditionalPodSystemProperties().isEmpty) +assert(step.getAdditionalKubernetesResources().isEmpty) + } + + test("mount delegation tokens if provided") { +val dtSecret = "tokenSecret" +val sparkConf = new SparkConf(false) +
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99670 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99670/testReport)** for PR 23217 at commit [`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #99679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5732/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23159 Rather than change every single call to this method, if this should generally be the value of the argument, then why not make it the default value or something? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23218 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23218 Now, it's released. Let's try again. - http://central.maven.org/maven2/com/typesafe/genjavadoc/genjavadoc-plugin_2.12.8/0.11/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23159 **[Test build #99678 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99678/testReport)** for PR 23159 at commit [`b6fa959`](https://github.com/apache/spark/commit/b6fa95981970788a09657b0a29712b53c01831db). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5731/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23159 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23159: [SPARK-26191][SQL] Control truncation of Spark plans via...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/23159 @HyukjinKwon @dongjoon-hyun @srowen @zsxwing Do you have any objections of this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23219 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99669/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23219 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23219 **[Test build #99669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99669/testReport)** for PR 23219 at commit [`94f76e5`](https://github.com/apache/spark/commit/94f76e543c1b146d4d25d3e15b6efd4777af7652). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99665/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22514 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22514: [SPARK-25271][SQL] Hive ctas commands should use data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22514 **[Test build #99665 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99665/testReport)** for PR 22514 at commit [`3c07d74`](https://github.com/apache/spark/commit/3c07d74ec94cf85f9aad79cde6c42ea667cb2090). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class OptimizedCreateHiveTableAsSelectCommand(` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r238739905 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala --- @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.util.Try + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + + +private[spark] case class ProcfsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsMetricsGetter(val procfsDir: String = "/proc/") extends Logging { + val procfsStatFile = "stat" + val testing = sys.env.contains("SPARK_TESTING") || sys.props.contains("spark.testing") + val pageSize = computePageSize() + var isAvailable: Boolean = isProcfsAvailable + private val pid = computePid() + + private lazy val isProcfsAvailable: Boolean = { +if (testing) { + true +} +else { + val procDirExists = Try(Files.exists(Paths.get(procfsDir))).recover { +case ioe: IOException => + logWarning("Exception checking for procfs dir", ioe) + false + } + val shouldLogStageExecutorMetrics = +SparkEnv.get.conf.get(config.EVENT_LOG_STAGE_EXECUTOR_METRICS) + val shouldLogStageExecutorProcessTreeMetrics = +SparkEnv.get.conf.get(config.EVENT_LOG_PROCESS_TREE_METRICS) + procDirExists.get && shouldLogStageExecutorProcessTreeMetrics && shouldLogStageExecutorMetrics +} + } + + private def computePid(): Int = { +if (!isAvailable || testing) { + return -1; +} +try { + // This can be simplified in java9: + // https://docs.oracle.com/javase/9/docs/api/java/lang/ProcessHandle.html + val cmd = Array("bash", "-c", "echo $PPID") + val out = Utils.executeAndGetOutput(cmd) + val pid = Integer.parseInt(out.split("\n")(0)) + return pid; +} +catch { + case e: SparkException => +logWarning("Exception when trying to compute process tree." + + " As a result reporting of ProcessTree metrics is stopped", e) +isAvailable = false +-1 +} + } + + private def computePageSize(): Long = { +if (testing) { + return 4096; +} +try { + val cmd = Array("getconf", "PAGESIZE") + val out = Utils.executeAndGetOutput(cmd) + Integer.parseInt(out.split("\n")(0)) +} catch { + case e: Exception => +logWarning("Exception when trying to compute pagesize, as a" + + " result reporting of ProcessTree metrics is stopped") +isAvailable = false +0 +} + } + + private def computeProcessTree(): Set[Int] = { +if (!isAvailable || testing) { + return Set() +} +var ptree: Set[Int] = Set() +ptree += pid +val queue = mutable.Queue.empty[Int] +queue += pid +while ( !queue.isEmpty ) { + val p = queue.dequeue() + val c = getChildPids(p) + if (!c.isEmpty) { +queue ++= c +ptree ++= c.toSet + } +} +ptree + } + + private def getChildPids(pid: Int): ArrayBuffer[Int] = { +try { + val builder = new ProcessBuilder("pgrep", "-P", pid.toString) + val process = b
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99677/testReport)** for PR 23207 at commit [`fcd62b3`](https://github.com/apache/spark/commit/fcd62b390ba4b5e2b1b9c6138026ac6da1b78d1f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5730/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23207 **[Test build #99676 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99676/testReport)** for PR 23207 at commit [`ca6c407`](https://github.com/apache/spark/commit/ca6c407929e62492a2c5233504efaeaf731f8cc9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23207 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5729/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238732441 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask( threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime } else 0L +// Register the shuffle write metrics reporter to shuffleWriteMetrics. +if (dep.shuffleWriteMetricsReporter.isDefined) { + context.taskMetrics().shuffleWriteMetrics.registerExternalShuffleWriteReporter( --- End diff -- Cool! That's a more cleaner implementation on consistency for both read and write metrics reporter, also read metrics can extend `ShuffleReadMetricsReporter` directly. Done in ca6c407 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r238731915 --- Diff: core/src/main/scala/org/apache/spark/executor/ProcfsMetricsGetter.scala --- @@ -0,0 +1,223 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.executor + +import java.io._ +import java.nio.charset.Charset +import java.nio.file.{Files, Paths} +import java.util.Locale + +import scala.collection.mutable +import scala.collection.mutable.ArrayBuffer +import scala.util.Try + +import org.apache.spark.{SparkEnv, SparkException} +import org.apache.spark.internal.{config, Logging} +import org.apache.spark.util.Utils + + +private[spark] case class ProcfsMetrics( +jvmVmemTotal: Long, +jvmRSSTotal: Long, +pythonVmemTotal: Long, +pythonRSSTotal: Long, +otherVmemTotal: Long, +otherRSSTotal: Long) + +// Some of the ideas here are taken from the ProcfsBasedProcessTree class in hadoop +// project. +private[spark] class ProcfsMetricsGetter(val procfsDir: String = "/proc/") extends Logging { --- End diff -- Oh ok. I didn't know that you don't need to mention val if you want a val parameter. Not related here but I think you can have a var parameter if you mention var in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5726/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23216 **[Test build #99674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99674/testReport)** for PR 23216 at commit [`b3ede8b`](https://github.com/apache/spark/commit/b3ede8be1a9073f057cc46fb82eacd7fa3ec36c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22583 **[Test build #99675 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99675/testReport)** for PR 22583 at commit [`d8f26d9`](https://github.com/apache/spark/commit/d8f26d9761be5c0649f6dcf67504e366a30e4e0f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5727/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22583 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22583: [SPARK-10816][SS] SessionWindow support for Structure St...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22583 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5728/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23216 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22535: [SPARK-17636][SQL][WIP] Parquet predicate pushdown in ne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22535 **[Test build #99673 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99673/testReport)** for PR 22535 at commit [`c95706f`](https://github.com/apache/spark/commit/c95706f60e4d576caca78a32000d4a7bbb12c141). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99672 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99672/testReport)** for PR 23215 at commit [`272bb1d`](https://github.com/apache/spark/commit/272bb1da8317883c8256e0484738b029bea9f9bb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/23219 @srowen Sorry. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/23219 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23219 @wangyum I already opened https://github.com/apache/spark/pull/23218 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99671 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99671/testReport)** for PR 23215 at commit [`4719765`](https://github.com/apache/spark/commit/4719765bace94f8cda2690ec042ca7079e88443b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238707304 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,127 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + // Returns `DataWritingCommand` used to write data when the table exists. + def writingCommandForExistingTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + + // Returns `DataWritingCommand` used to write data when the table doesn't exist. + def writingCommandForNewTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +InsertIntoHiveTable( + tableDesc, + Map.empty, + query, + overwrite = false, + ifPartitionNotExists = false, + outputColumnNames = outputColumnNames) + } + + override def writingCommandForNewTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +// For CTAS, there is no static partition values to insert. +val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap +InsertIntoHiveTable( + tableDesc, + partition, + query, + overwrite = true, + ifPartitionNotExists = false, + outputColumnNames = outputColumnNames) + } +} + +/** + * Create table and insert the query result into it. This creates Hive table but inserts + * the query result into it by using data source. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class OptimizedCreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + private def getHadoopRelation( + catalog: SessionCatalog, + tableDesc: CatalogTable): HadoopFsRelation = { +val metastoreCatalog = catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog +val hiveTable = DDLUtils.readHiveTable(tableDesc) + +metastoreCatalog.convert(hiveTable) match { + case LogicalRelation(t: HadoopFsRelation, _, _, _) => t + case _ => throw new AnalysisException(s"$tableIdentifier should be converted to " + +"HadoopFsRelation.") +} + } + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +val hadoopRelation = getHadoopRelation(catalog, tableDesc) +InsertIntoHadoopFsRelationCommand( + hadoopRelation.location.rootPaths.head, + Map.empty, // We don't support to convert partitioned table. + false, + Seq.empty, // We don't support to convert partitioned table. + hadoopRelation.bucketSpec, + hadoopRelation.fileFormat, + hadoopRelation.options, + query, + mode, + Some(tableDesc), + Some(hadoopRelation.location), + query.output.map(_.name)) + } + + override def writingCommandForNewTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +val hadoopRelation = getHadoopRelation(catalog, tableDesc) +InsertIntoHadoopFsRelationCommand( + hadoopRelation.location.rootPaths.head, + Map.empty, // We don't support to convert partitioned table. + false, + Seq.empty, // We don't support to convert partitioned table. + hadoopRelation.bucketSpec, + hadoopRelation.fileFormat, + hadoopRelation.options, + query, + SaveMode.Overwrite, --- End diff -- ok :)
[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5725/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99663/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99662/testReport)** for PR 23213 at commit [`0305a05`](https://github.com/apache/spark/commit/0305a05ee53f45d6a7f922ab1b6a6a5ec44cc607). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99663/testReport)** for PR 23213 at commit [`57eec69`](https://github.com/apache/spark/commit/57eec69affadb91efec2b4c1f9086062433c86fa). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99662/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23213 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r238706120 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,127 @@ case class CreateHiveTableAsSelectCommand( Seq.empty[Row] } + // Returns `DataWritingCommand` used to write data when the table exists. + def writingCommandForExistingTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + + // Returns `DataWritingCommand` used to write data when the table doesn't exist. + def writingCommandForNewTable( +catalog: SessionCatalog, +tableDesc: CatalogTable): DataWritingCommand + override def argString: String = { s"[Database:${tableDesc.database}, " + s"TableName: ${tableDesc.identifier.table}, " + s"InsertIntoHiveTable]" } } + +/** + * Create table and insert the query result into it. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class CreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +InsertIntoHiveTable( + tableDesc, + Map.empty, + query, + overwrite = false, + ifPartitionNotExists = false, + outputColumnNames = outputColumnNames) + } + + override def writingCommandForNewTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +// For CTAS, there is no static partition values to insert. +val partition = tableDesc.partitionColumnNames.map(_ -> None).toMap +InsertIntoHiveTable( + tableDesc, + partition, + query, + overwrite = true, + ifPartitionNotExists = false, + outputColumnNames = outputColumnNames) + } +} + +/** + * Create table and insert the query result into it. This creates Hive table but inserts + * the query result into it by using data source. + * + * @param tableDesc the table description, which may contain serde, storage handler etc. + * @param query the query whose result will be insert into the new relation + * @param mode SaveMode + */ +case class OptimizedCreateHiveTableAsSelectCommand( +tableDesc: CatalogTable, +query: LogicalPlan, +outputColumnNames: Seq[String], +mode: SaveMode) + extends CreateHiveTableAsSelectBase { + + private def getHadoopRelation( + catalog: SessionCatalog, + tableDesc: CatalogTable): HadoopFsRelation = { +val metastoreCatalog = catalog.asInstanceOf[HiveSessionCatalog].metastoreCatalog +val hiveTable = DDLUtils.readHiveTable(tableDesc) + +metastoreCatalog.convert(hiveTable) match { + case LogicalRelation(t: HadoopFsRelation, _, _, _) => t + case _ => throw new AnalysisException(s"$tableIdentifier should be converted to " + +"HadoopFsRelation.") +} + } + + override def writingCommandForExistingTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +val hadoopRelation = getHadoopRelation(catalog, tableDesc) +InsertIntoHadoopFsRelationCommand( + hadoopRelation.location.rootPaths.head, + Map.empty, // We don't support to convert partitioned table. + false, + Seq.empty, // We don't support to convert partitioned table. + hadoopRelation.bucketSpec, + hadoopRelation.fileFormat, + hadoopRelation.options, + query, + mode, + Some(tableDesc), + Some(hadoopRelation.location), + query.output.map(_.name)) + } + + override def writingCommandForNewTable( + catalog: SessionCatalog, + tableDesc: CatalogTable): DataWritingCommand = { +val hadoopRelation = getHadoopRelation(catalog, tableDesc) +InsertIntoHadoopFsRelationCommand( + hadoopRelation.location.rootPaths.head, + Map.empty, // We don't support to convert partitioned table. + false, + Seq.empty, // We don't support to convert partitioned table. + hadoopRelation.bucketSpec, + hadoopRelation.fileFormat, + hadoopRelation.options, + query, + SaveMode.Overwrite, --- End diff -- if
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99670 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99670/testReport)** for PR 23217 at commit [`38f3bfa`](https://github.com/apache/spark/commit/38f3bfa237570a3204c355774bb323973f962d67). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23219 **[Test build #99669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99669/testReport)** for PR 23219 at commit [`94f76e5`](https://github.com/apache/spark/commit/94f76e543c1b146d4d25d3e15b6efd4777af7652). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5724/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23219: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23219 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org