[GitHub] spark pull request #23214: [SPARK-26155] Optimizing the performance of LongT...
Github user LuciferYang commented on a diff in the pull request: https://github.com/apache/spark/pull/23214#discussion_r238558889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala --- @@ -398,8 +399,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap private var numKeys = 0L // Tracking average number of probes per key lookup. - private var numKeyLookups = 0L - private var numProbes = 0L + private var numKeyLookups = new LongAdder + private var numProbes = new LongAdder --- End diff -- Initially, I thought these two variables in class scope will affect SIMD optimization of JIT(after java8), we try to add `-XX: -UseSuperWord` to executor java opts to vertify this view, but no affect with spark-2.1, although this patch can imporve performance --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23214 **[Test build #99651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99651/testReport)** for PR 23214 at commit [`a267e6b`](https://github.com/apache/spark/commit/a267e6bbf874038573c598e4c411274c8b459701). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #99650 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99650/testReport)** for PR 22468 at commit [`fbfbbff`](https://github.com/apache/spark/commit/fbfbbff55d900ae1101ceb4f7823a9298464cb07). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23216: [SPARK-26264][CORE]It is better to add @transient...
GitHub user 10110346 opened a pull request: https://github.com/apache/spark/pull/23216 [SPARK-26264][CORE]It is better to add @transient to field 'locs' for class `ResultTask`. ## What changes were proposed in this pull request? The field 'locs' is only used in driver side for class `ResultTask`, so it is not needed to serialize when sending the `ResultTask` to executor. Although it's not very big, it's very frequent, so we can add` transient` for it like `ShuffleMapTask`. ## How was this patch tested? Existed unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/10110346/spark locs_transient Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23216.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23216 commit b3ede8be1a9073f057cc46fb82eacd7fa3ec36c6 Author: liuxian Date: 2018-12-04T08:55:40Z fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/23213 cc: @cloud-fan @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user shahidki31 commented on the issue: https://github.com/apache/spark/pull/23088 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 ``` For broadcast hash join, we will copy the broadcasted hash relation to avoid multi-thread problem, via HashedRelation.asReadOnlyCopy. However, this is a shallow copy, the LongToUnsafeRowMap is not copied and likely shared by multiple HashedRelations. ``` Was there no problems of data correctness in the past use unthread-safe Long type? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99656 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99656/testReport)** for PR 23194 at commit [`ac2b004`](https://github.com/apache/spark/commit/ac2b0048f13acbce8f9b82f2b1c5f5cd268c63d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user heary-cao commented on the issue: https://github.com/apache/spark/pull/23194 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23215 **[Test build #99657 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99657/testReport)** for PR 23215 at commit [`7060e12`](https://github.com/apache/spark/commit/7060e127de339de42be12ed382ef0a4363ae325d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23195 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23216 **[Test build #99660 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99660/testReport)** for PR 23216 at commit [`b3ede8b`](https://github.com/apache/spark/commit/b3ede8be1a9073f057cc46fb82eacd7fa3ec36c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23194 **[Test build #99652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99652/testReport)** for PR 23194 at commit [`ac2b004`](https://github.com/apache/spark/commit/ac2b0048f13acbce8f9b82f2b1c5f5cd268c63d4). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/23214 Thanks for doing this. I think we are more close to the root cause. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99652/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23214 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99651/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99653 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99653/testReport)** for PR 23088 at commit [`41cfe80`](https://github.com/apache/spark/commit/41cfe8084a73e13336ab753a46fdc4c950583478). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #99647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99647/testReport)** for PR 22468 at commit [`7ef5f86`](https://github.com/apache/spark/commit/7ef5f866eb02f6638a5be00a602de6c6810ae2a3). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99650/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99653/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23214 I think there is a problem, but no one found out because it's only about metrics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238596372 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: --- End diff -- > "Spark supports" Maybe `Spark can be configured to use` is better phrase. > explaining each option here is not really that helpful * I think the list must be kept (maybe without explanation) because if there is an authentication protocol in kafka it doesn't mean spark is prepared to use it. * With the explanation wanted to give a high level feeling what it's roughly does and Kafka's doc is there to take a deeper look. I'm neutral on removing them. Should we? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238596541 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark spreads it across nodes and renews it accordingly. --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238599018 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default + but can be turned off with `spark.security.credentials.kafka.enabled`. When + `spark.kafka.bootstrap.servers` set Spark looks for authentication information in the following + order and choose the first available to log in: + - **JAAS login configuration** + - **Keytab file**, such as, + +./bin/spark-submit \ +--keytab \ +--principal \ +--conf spark.kafka.bootstrap.servers= \ +... + + - **Kerberos credential cache**, such as, + +./bin/spark-submit \ +--conf spark.kafka.bootstrap.servers= \ +... + + Spark supports the following authentication protocols to obtain token: + - **SASL SSL (default)**: With `GSSAPI` mechanism Kerberos used for authentication and SSL for encryption. + - **SSL**: It's leveraging a capability from SSL called 2-way authentication. The server authenticates +clients through certificates. Please note 2-way authentication must be enabled on Kafka brokers. + - **SASL PLAINTEXT (for testing)**: With `GSSAPI` mechanism Kerberos used for authentication but +because there is no encryption it's only for testing purposes. + + After obtaining delegation token successfully, Spark spreads it across nodes and renews it accordingly. + Delegation token uses `SCRAM` login module for authentication. --- End diff -- `SCRAM` module supports only a couple `sasl.mechanism` like `SCRAM-SHA-256`, `SCRAM-SHA-512` etc. which has to be configured on the source/sink. I've updated the description to reflect this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5714/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23195 **[Test build #99659 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99659/testReport)** for PR 23195 at commit [`b97492a`](https://github.com/apache/spark/commit/b97492adb0bbfdbc372c437767e9db6e5d4585ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23195 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20433 cc: @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625915 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val (comments, code) = input.split("\n").partition(_.startsWith("--")) // Runs all the tests on both codegen-only and interpreter modes -val codegenConfigSets = Array(CODEGEN_ONLY, NO_CODEGEN).map { - case codegenFactoryMode => -Array(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenFactoryMode.toString) +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625581 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -144,9 +144,10 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { val (comments, code) = input.split("\n").partition(_.startsWith("--")) // Runs all the tests on both codegen-only and interpreter modes -val codegenConfigSets = Array(CODEGEN_ONLY, NO_CODEGEN).map { - case codegenFactoryMode => -Array(SQLConf.CODEGEN_FACTORY_MODE.key -> codegenFactoryMode.toString) +val codegenConfigSets = Array(("false", "NO_CODEGEN"), ("true", "CODEGEN_ONLY")).map { --- End diff -- shall we test all the combinations? e.g. `wholeStage=on, codegen=off` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5710/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23088 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5711/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22468 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23214: [SPARK-26155] Optimizing the performance of LongToUnsafe...
Github user LuciferYang commented on the issue: https://github.com/apache/spark/pull/23214 On the other hand, if is only a `multi-thread problem`, may not affect performance because there is no synchronized code part ... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238583330 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- But `ProjectExec.outputPartitioning` cannot contain a reference to the aliases in its project list, as its output partitioning is the one of the child, where that alias doesn't exist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5713/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23195: [SPARK-26236][SS] Add kafka delegation token support doc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23195 **[Test build #99659 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99659/testReport)** for PR 23195 at commit [`b97492a`](https://github.com/apache/spark/commit/b97492adb0bbfdbc372c437767e9db6e5d4585ce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238626367 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- for example, `relation[a, b]`'s output partitioning is `[hash partition a, hash partition b]`, and `Project(a as c, b, relation)`'s output partitioning should be `[hash partition c, hash partition b]`. What do you mean by `But ProjectExec.outputPartitioning cannot contain a reference to the aliases`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99647/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 Can you share some ideas about it? IMO shuffle write metrics is hard, as an RDD can have shuffle dependencies with multiple upstream RDDs. That said, in general the shuffle write metrics should belong to the upstream RDDs. In Spark SQL, it's a little simpler, as the `ShuffledRowRDD` always have only one child, so it's reasonable to say that shuffle write metrics belong to `ShuffledRowRDD`. That said, we need to design a not-so-general shuffle write metrics API in Spark core, which will only be used in Spark SQL. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/23207 Thanks for your reply Wenchen, there's a sketch doc assigned in JIRA:https://docs.google.com/document/d/1DX0gLkpk_NCE5MwI1_m4gnA2rLdjDkynZ02u2VWDR-8/edit ``` IMO shuffle write metrics is hard, as an RDD can have shuffle dependencies with multiple upstream RDDs. That said, in general the shuffle write metrics should belong to the upstream RDDs. ``` That's right and that's also what I try to do at first, logically upstream operator trigger shuffle write, and first attempt implementation is also changed SparkPlan base class to achieve this. ``` In Spark SQL, it's a little simpler, as the ShuffledRowRDD always have only one child, so it's reasonable to say that shuffle write metrics belong to ShuffledRowRDD. ``` Yes, maybe this also the suggestion by Reynold, ShuffleExchangeExec has only one child, we can do a simplify on the implementation. But as the shuffle write metrics are updated by task inner, so the core module still need some changes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21919 **[Test build #99658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99658/testReport)** for PR 21919 at commit [`43fae6a`](https://github.com/apache/spark/commit/43fae6a83e3b8e1be310da77641f7fb889691c81). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22450 @cloud-fan this has been stuck for a while now. Is there something blocking this? Is there something I can do? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23215 I think this new behavior makes more sense, but we need to add a migration guide. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23194: [MINOR][SQL] Combine the same codes in test cases
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23194 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5712/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23215: [SPARK-26263][SQL] Throw exception when Partition...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/23215 [SPARK-26263][SQL] Throw exception when Partition column value can't be converted to user specified type ## What changes were proposed in this pull request? Currently if user provides data schema, partition column values are converted as per it. But if the conversion failed, e.g. converting string to int, the column value is null. We should throw exception in such case. For the following directory ``` /tmp/testDir ├── p=bar └── p=foo ``` If we run: ``` val schema = StructType(Seq(StructField("p", IntegerType, false))) spark.read.schema(schema).csv("/tmp/testDir/").show() ``` We will get: ``` ++ | p| ++ |null| |null| ++ ``` This PR proposes to throw exception in such case, instead of converting into null value silently: 1. These null partition column values doesn't make sense to users in most cases. It is better to show the conversion failure, and then users can adjust the schema or ETL jobs to fix it. 2. There are always exceptions on such conversion failure for non-partition data columns. Partition columns should have the same behavior. ## How was this patch tested? Unit test You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark SPARK-26263 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23215.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23215 commit 7060e127de339de42be12ed382ef0a4363ae325d Author: Gengliang Wang Date: 2018-12-04T09:43:03Z SPARK-26263: Throw exception when partition value can't be converted to specific type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23215: [SPARK-26263][SQL] Throw exception when Partition column...
Github user gengliangwang commented on the issue: https://github.com/apache/spark/pull/23215 @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238627633 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- all the explain related tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22468 **[Test build #99655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99655/testReport)** for PR 22468 at commit [`fbfbbff`](https://github.com/apache/spark/commit/fbfbbff55d900ae1101ceb4f7823a9298464cb07). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99654 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99654/testReport)** for PR 23088 at commit [`41cfe80`](https://github.com/apache/spark/commit/41cfe8084a73e13336ab753a46fdc4c950583478). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23203: [SPARK-26252][PYTHON] Add support to run specific unitte...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/23203 Yea, will update it as well after this one gets merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23195: [SPARK-26236][SS] Add kafka delegation token supp...
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/23195#discussion_r238593413 --- Diff: docs/structured-streaming-kafka-integration.md --- @@ -624,3 +624,56 @@ For experimenting on `spark-shell`, you can also use `--packages` to add `spark- See [Application Submission Guide](submitting-applications.html) for more details about submitting applications with external dependencies. + +## Security + +Kafka 0.9.0.0 introduced several features that increases security in a cluster. For detailed +description about these possibilities, see [Kafka security docs](http://kafka.apache.org/documentation.html#security). + +It's worth noting that security is optional and turned off by default. + +Spark supports the following ways to authenticate against Kafka cluster: +- **Delegation token (introduced in Kafka broker 1.1.0)**: This way the application can be configured + via Spark parameters and may not need JAAS login configuration (Spark can use Kafka's dynamic JAAS + configuration feature). For further information about delegation tokens, see + [Kafka delegation token docs](http://kafka.apache.org/documentation/#security_delegation_token). + + The process is initiated by Spark's Kafka delegation token provider. This is enabled by default --- End diff -- Fixed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22822: [SPARK-25678] Requesting feedback regarding a pro...
Github user UtkarshMe closed the pull request at: https://github.com/apache/spark/pull/22822 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23213 just a question, why didn't we introduce something like what was done in SPARK-24562? I see that these are configs which are valid for all queries, so using what was done in SPARK-24562 is not a good idea, but something similar (eg a file with all the config sets to use)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625268 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- can we move them to `ExplainSuite`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHO...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/23213#discussion_r238625477 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2899,6 +2899,144 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + private def checkKeywordsExistsInExplain(df: DataFrame, keywords: String*): Unit = { +val output = new java.io.ByteArrayOutputStream() +Console.withOut(output) { + df.explain(extended = true) +} +val normalizedOutput = output.toString.replaceAll("#\\d+", "#x") +for (key <- keywords) { + assert(normalizedOutput.contains(key)) +} + } + + test("optimized plan should show the rewritten aggregate expression") { --- End diff -- all the tests? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/23218 Surprisingly, all of three are due to consistent JVM crashes. It seems that Scala 2.12.8 or Spark has some unstable code somewhere. - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99679/consoleFull ``` [info] - SPARK-17641: collect functions should not collect null values (231 milliseconds) 10:51:04.251 WARN org.apache.spark.sql.execution.window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. 10:51:04.262 WARN org.apache.spark.sql.execution.window.WindowExec: No Partition Defined for Window operation! Moving all data to a single partition, this can cause serious performance degradation. # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fa843744e44, pid=116353, tid=140360030242560 ``` - https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4451/consoleFull ``` [info] - read from textfile (508 milliseconds) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f60ec641e44, pid=40380, tid=140053491689216 # ``` - https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4452/consoleFull ``` [info] - SPARK-21996 read from text files generated by file sink -- file name has space (532 milliseconds) # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7f399e84ee44, pid=106264, tid=139883238606592 # ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23213 > I personally think its orthogonal to SPARK-24562. yes I agree. I am just asking if it makes sense to create a framework like that. Now it is only about codegen, but in the future we may want to add more configs. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22468 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/22952 @HeartSaVioR I've taken a look at the possibilities: * [GlobExpander](https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/GlobExpander.java#L63) is private * `globStatus` recursively is not an option because of it's poor performance * `globStatus` with limited scope can be an option but there are cases where it might take some execution time * Print warnings and not moving files is an option which seems feasible --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat i...
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/23217 [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order to check properly the limit size ## What changes were proposed in this pull request? The PR starts from the [comment](https://github.com/apache/spark/pull/23124#discussion_r236112390) in the main one and it aims at: - simplifying the code for `MapConcat`; - be more precise in checking the limit size. ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/mgaido91/spark SPARK-25829_followup Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23217.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23217 commit 54f0f31aaa14de7c44c336580c7ed18e8ffb4b54 Author: Marco Gaido Date: 2018-12-04T12:35:09Z [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order to check properly the limit size --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/23217 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99664/testReport)** for PR 23217 at commit [`54f0f31`](https://github.com/apache/spark/commit/54f0f31aaa14de7c44c336580c7ed18e8ffb4b54). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22600: [SPARK-25578][BUILD] Update to Scala 2.12.7
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22600 2.12.8 is out. Do we need to upgrade to 2.12.8? 2.12.8 fixes two regressions that appeared in 2.12.7: ``` Don't reject views with result types which are TypeVars (#7295) Don't emit static forwarders (which simplify the use of methods in top-level objects from Java) for bridge methods (#7469) ``` More details: https://github.com/scala/scala/releases/tag/v2.12.8 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99660/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23216: [SPARK-26264][CORE]It is better to add @transient to fie...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23216 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99664/testReport)** for PR 23217 at commit [`54f0f31`](https://github.com/apache/spark/commit/54f0f31aaa14de7c44c336580c7ed18e8ffb4b54). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99664/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/23098 Note I'm holding on to this PR for a while as I understand it might be disruptive to downstream builds to remove 2.11 support just now. Will look at merging it in weeks. Right now it's an FYI. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23098: [WIP][SPARK-26132][BUILD][CORE] Remove support for Scala...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23098 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23217 **[Test build #99668 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99668/testReport)** for PR 23217 at commit [`724db5c`](https://github.com/apache/spark/commit/724db5cd752d2c79032a887e8ae2806d9a5acc65). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238630981 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask( threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime } else 0L +// Register the shuffle write metrics reporter to shuffleWriteMetrics. +if (dep.shuffleWriteMetricsReporter.isDefined) { + context.taskMetrics().shuffleWriteMetrics.registerExternalShuffleWriteReporter( --- End diff -- This happens per-task, I think `ShuffleWriteMetrics.externalReporters` can be option instead of array. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r238633725 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala --- @@ -92,6 +92,12 @@ private[spark] class ShuffleMapTask( threadMXBean.getCurrentThreadCpuTime - deserializeStartCpuTime } else 0L +// Register the shuffle write metrics reporter to shuffleWriteMetrics. +if (dep.shuffleWriteMetricsReporter.isDefined) { + context.taskMetrics().shuffleWriteMetrics.registerExternalShuffleWriteReporter( --- End diff -- a simpler idea: 1. create a `class GroupedShuffleWriteMetricsReporter(reporters: Seq[ShuffleWriteMetricsReporter]) extends ShuffleWriteMetricsReporter`, which proxy all the metrics updating to the input reporters. 2. create a `GroupedShuffleWriteMetricsReporter` instance here: `new GroupedShuffleWriteMetricsReporter(Seq(dep.shuffleWriteMetricsReporter.get, context.taskMetrics().shuffleWriteMetrics))`, and pass it to `manager.getWriter` I think we can use the same approach for read metrics as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23213: [SPARK-26262][SQL] Run SQLQueryTestSuite with WHOLESTAGE...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23213 **[Test build #99661 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99661/testReport)** for PR 23213 at commit [`3ef5e3e`](https://github.com/apache/spark/commit/3ef5e3ea937843c65d0ddc11810083d9e3d772b7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23088 **[Test build #99654 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99654/testReport)** for PR 23088 at commit [`41cfe80`](https://github.com/apache/spark/commit/41cfe8084a73e13336ab753a46fdc4c950583478). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238642801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- I don't think that is right: that would cause the shuffle to happen for every plan which is hashed by both `[hash part c, hash part b]` and `[hash part d, hash part b]` (and also `[hash part a, hash part b]`). I think that if we want to go that way, we'd need a set of equivalent outputPatitioning --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23217: [SPARK-25829][SQL][FOLLOWUP] Refactor MapConcat in order...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23217 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5718/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23194: [MINOR][SQL] Combine the same codes in test cases
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23194 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21919 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21919 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/23218 **[Test build #99666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99666/testReport)** for PR 23218 at commit [`b667d37`](https://github.com/apache/spark/commit/b667d37e9ee2d8cdce459806925cdc0fe725b7bf). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23218: [SPARK-26266][BUILD] Update to Scala 2.12.8
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/23218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99666/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22468: [SPARK-25374][SQL] SafeProjection supports fallba...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22468#discussion_r238683833 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala --- @@ -535,4 +535,98 @@ class UnsafeRowConverterSuite extends SparkFunSuite with Matchers with PlanTestB assert(unsafeRow.getSizeInBytes == 8 + 8 * 2 + roundedSize(field1.getSizeInBytes) + roundedSize(field2.getSizeInBytes)) } + + testBothCodegenAndInterpreted("SPARK-25374 converts back into safe representation") { +def convertBackToInternalRow(inputRow: InternalRow, fields: Array[DataType]): InternalRow = { + val unsafeProj = UnsafeProjection.create(fields) + val unsafeRow = unsafeProj(inputRow) + val safeProj = SafeProjection.create(fields) + safeProj(unsafeRow) +} + +// Simple tests +val inputRow = InternalRow.fromSeq(Seq( + false, 3.toByte, 15.toShort, -83, 129L, 1.0f, 8.0, UTF8String.fromString("test"), + Decimal(255), CalendarInterval.fromString("interval 1 day"), Array[Byte](1, 2) +)) +val fields1 = Array( + BooleanType, ByteType, ShortType, IntegerType, LongType, FloatType, + DoubleType, StringType, DecimalType.defaultConcreteType, CalendarIntervalType, + BinaryType) + +assert(convertBackToInternalRow(inputRow, fields1) === inputRow) + +// Array tests +val arrayRow = InternalRow.fromSeq(Seq( + createArray(1, 2, 3), + createArray( +createArray(Seq("a", "b", "c").map(UTF8String.fromString): _*), +createArray(Seq("d").map(UTF8String.fromString): _*)) +)) +val fields2 = Array[DataType]( + ArrayType(IntegerType), + ArrayType(ArrayType(StringType))) + +assert(convertBackToInternalRow(arrayRow, fields2) === arrayRow) + +// Struct tests +val structRow = InternalRow.fromSeq(Seq( + InternalRow.fromSeq(Seq[Any](1, 4.0)), + InternalRow.fromSeq(Seq( +UTF8String.fromString("test"), +InternalRow.fromSeq(Seq( + 1, + createArray(Seq("2", "3").map(UTF8String.fromString): _*) +)) + )) +)) +val fields3 = Array[DataType]( + StructType( +StructField("c0", IntegerType) :: +StructField("c1", DoubleType) :: +Nil), + StructType( +StructField("c2", StringType) :: +StructField("c3", StructType( + StructField("c4", IntegerType) :: + StructField("c5", ArrayType(StringType)) :: + Nil)) :: +Nil)) + +assert(convertBackToInternalRow(structRow, fields3) === structRow) + +// Map tests +val mapRow = InternalRow.fromSeq(Seq( + createMap(Seq("k1", "k2").map(UTF8String.fromString): _*)(1, 2), + createMap( +createMap(3, 5)(Seq("v1", "v2").map(UTF8String.fromString): _*), +createMap(7, 9)(Seq("v3", "v4").map(UTF8String.fromString): _*) + )( +createMap(Seq("k3", "k4").map(UTF8String.fromString): _*)(3.toShort, 4.toShort), +createMap(Seq("k5", "k6").map(UTF8String.fromString): _*)(5.toShort, 6.toShort) + ))) +val fields4 = Array[DataType]( + MapType(StringType, IntegerType), + MapType(MapType(IntegerType, StringType), MapType(StringType, ShortType))) + +val mapResultRow = convertBackToInternalRow(mapRow, fields4).toSeq(fields4) +val mapExpectedRow = mapRow.toSeq(fields4) +// Since `ArrayBasedMapData` does not override `equals` and `hashCode`, --- End diff -- `ArrayBasedMapData`/`UnsafeMapData` does not have `equals()` or `hashCode()` implemented because we do not have a good story around map equality. Implementing equals/hashcode for map is only half of the solution, we would also need a comparable binary format. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22249: [SPARK-16281][SQL][FOLLOW-UP] Add parse_url to fu...
Github user TomaszGaweda closed the pull request at: https://github.com/apache/spark/pull/22249 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #23171: [SPARK-26205][SQL] Optimize In for bytes, shorts, ints
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/23171 As @rxin said, if we introduce a separate expression for the switch-based approach, then we will need to modify other places. For example, `DataSourceStrategy$translateFilter`. So, integrating into `In` or `InSet` seems safer. I think we can move the switch-based logic to `InSet` and make `InSet` responsible for picking the most optimal execution path. We might need to modify the condition when we convert `In` to `InSet` as this will most likely depend on the underlying data type. I saw noticeable improvements starting from 5 elements when you compare the if-else approach to the switch-based one. Right now, the conversion happens for more than 10 elements. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238634730 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- For `Project(a as c, a as d, b, relation)`, I think the `outputPartitioning` should be `[hash part c, hash part d, hash part b]`. The point is, we should not report an output partitioning whose attribute is not even in the current plan's output. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22468: [SPARK-25374][SQL] SafeProjection supports fallback to a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99655/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22957: [SPARK-25951][SQL] Ignore aliases for distributio...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22957#discussion_r238650207 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala --- @@ -145,9 +145,14 @@ case class EnsureRequirements(conf: SQLConf) extends Rule[SparkPlan] { assert(requiredChildDistributions.length == children.length) assert(requiredChildOrderings.length == children.length) +val aliasMap = AttributeMap[Expression](children.flatMap(_.expressions.collect { + case a: Alias => (a.toAttribute, a) +})) + // Ensure that the operator's children satisfy their output distribution requirements. children = children.zip(requiredChildDistributions).map { - case (child, distribution) if child.outputPartitioning.satisfies(distribution) => + case (child, distribution) if child.outputPartitioning.satisfies( + distribution.mapExpressions(replaceAlias(_, aliasMap))) => --- End diff -- seems we are not on the same page... Let's make the example clearer. Assuming a `relation[a ,b]`'s partitioning is `hash(a, b)`, then `Project(a as c, a as d, b, relation)`'s partitioning should be `[hash(c, b), hash(d, b)]`. It's like a flatMap. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org