[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836216634 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836192784 **[Test build #138321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138321/testReport)** for PR 32399 at commit [`a1724ab`](https://github.com/apache/spark/commit/a1724ab3c4bb852dcb227bced236fdcbd3f3b93f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue
SparkQA removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836190123 **[Test build #138320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138320/testReport)** for PR 32399 at commit [`a6874e5`](https://github.com/apache/spark/commit/a6874e5fc05c1f418500670c59c36bc799977761). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still co
AmplabJenkins removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836190600 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836190579 **[Test build #138320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138320/testReport)** for PR 32399 at commit [`a6874e5`](https://github.com/apache/spark/commit/a6874e5fc05c1f418500670c59c36bc799977761). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue r
AmplabJenkins commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836190600 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138320/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836190123 **[Test build #138320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138320/testReport)** for PR 32399 at commit [`a6874e5`](https://github.com/apache/spark/commit/a6874e5fc05c1f418500670c59c36bc799977761). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue
SparkQA removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836187589 **[Test build #138319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138319/testReport)** for PR 32399 at commit [`45c64ea`](https://github.com/apache/spark/commit/45c64ead77bfd897b53e383efa67e4ba35c2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still co
AmplabJenkins removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836188026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue r
AmplabJenkins commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836188026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138319/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836188006 **[Test build #138319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138319/testReport)** for PR 32399 at commit [`45c64ea`](https://github.com/apache/spark/commit/45c64ead77bfd897b53e383efa67e4ba35c2). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836187589 **[Test build #138319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138319/testReport)** for PR 32399 at commit [`45c64ea`](https://github.com/apache/spark/commit/45c64ead77bfd897b53e383efa67e4ba35c2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32031: [WIP] Initial work of Remote Shuffle Service on Kubernetes
SparkQA commented on pull request #32031: URL: https://github.com/apache/spark/pull/32031#issuecomment-836185216 **[Test build #138318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138318/testReport)** for PR 32031 at commit [`506149f`](https://github.com/apache/spark/commit/506149f3fa92b27bdf09da6748e91516b6dd5aea). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32475: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty
AmplabJenkins removed a comment on pull request #32475: URL: https://github.com/apache/spark/pull/32475#issuecomment-836183250 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins removed a comment on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836183252 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42836/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
AmplabJenkins removed a comment on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-836183249 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42838/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
AmplabJenkins commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-836183249 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42838/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32475: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty
AmplabJenkins commented on pull request #32475: URL: https://github.com/apache/spark/pull/32475#issuecomment-836183250 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836183252 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42836/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
SparkQA commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836177985 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
SparkQA commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-836176561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32475: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty
SparkQA commented on pull request #32475: URL: https://github.com/apache/spark/pull/32475#issuecomment-836174973 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32475: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty
SparkQA commented on pull request #32475: URL: https://github.com/apache/spark/pull/32475#issuecomment-836171886 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42837/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836153120 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138312/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836152449 **[Test build #138317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138317/testReport)** for PR 32399 at commit [`e8c86db`](https://github.com/apache/spark/commit/e8c86db1753a097e7ed442fd26d064693e0803e8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835964738 **[Test build #138312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138312/testReport)** for PR 32473 at commit [`21cc2ac`](https://github.com/apache/spark/commit/21cc2ac907ffe9256942d818663ce225d1a1b992). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836151733 **[Test build #138312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138312/testReport)** for PR 32473 at commit [`21cc2ac`](https://github.com/apache/spark/commit/21cc2ac907ffe9256942d818663ce225d1a1b992). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
srowen commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836150808 Getting pretty big! but OK if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on a change in pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
ulysses-you commented on a change in pull request #32482: URL: https://github.com/apache/spark/pull/32482#discussion_r629039347 ## File path: sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ## @@ -1175,7 +1175,7 @@ class CachedTableSuite extends QueryTest with SQLTestUtils } test("cache supports for intervals") { -withTable("interval_cache") { +withTable("interval_cache", "t1") { Review comment: not related this pr, but affected the new added test with `t1`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
SparkQA commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-836147815 **[Test build #138316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138316/testReport)** for PR 32482 at commit [`7625677`](https://github.com/apache/spark/commit/76256774c52b78b9f6011f82063004bf18734f01). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ulysses-you commented on pull request #32482: [SPARK-35332][SQL] Make cache plan disable configs configurable
ulysses-you commented on pull request #32482: URL: https://github.com/apache/spark/pull/32482#issuecomment-836147309 Thank you @maropu @c21 @dongjoon-hyun . Agree, the current config seems overkill to user, it's better to just make it as `enabled`. Refactor this PR to address: * make the new config simple and improve the doc. * improve the test for two things, 1) more pattern with AQE test, 2) bucketed test -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32475: [SPARK-34775][SQL] Push down limit through window when partitionSpec is not empty
SparkQA commented on pull request #32475: URL: https://github.com/apache/spark/pull/32475#issuecomment-83614 **[Test build #138315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138315/testReport)** for PR 32475 at commit [`bf9d041`](https://github.com/apache/spark/commit/bf9d04140d596ba9d4cfe33b0f497a5a9045ba37). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
SparkQA commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836145492 **[Test build #138314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138314/testReport)** for PR 32487 at commit [`4098407`](https://github.com/apache/spark/commit/4098407bf6b74f2045ca27c3851da249a2a6ec7e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate
AmplabJenkins commented on pull request #32488: URL: https://github.com/apache/spark/pull/32488#issuecomment-836144135 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady opened a new pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate
cfmcgrady opened a new pull request #32488: URL: https://github.com/apache/spark/pull/32488 ### What changes were proposed in this pull request? This pr add in/inset predicate support for `UnwrapCastInBinaryComparison`. Current implement doesn't pushdown filters for `In/InSet` which contains `Cast`. For instance: ```scala spark.range(50).selectExpr("cast(id as int) as id").write.mode("overwrite").parquet("/tmp/parquet/t1") spark.read.parquet("/tmp/parquet/t1").where("id in (1L, 2L, 4L)").explain ``` before this pr: ``` == Physical Plan == *(1) Filter cast(id#5 as bigint) IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#5] Batched: true, DataFilters: [cast(id#5 as bigint) IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct ``` after this pr: ``` == Physical Plan == *(1) Filter id#95 IN (1,2,4) +- *(1) ColumnarToRow +- FileScan parquet [id#95] Batched: true, DataFilters: [id#95 IN (1,2,4)], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/tmp/parquet/t1], PartitionFilters: [], PushedFilters: [In(id, [1,2,4])], ReadSchema: struct ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? New test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r629027318 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: @maropu - No I think we need buffer anyway. The buffered rows has same join keys with current streamed row. But there can be multiple followed streamed rows having same join keys, as the buffered rows. Even though buffered rows cannot match condition with current streamed row, they may match condition with followed streamed rows. I think this is how current sort merge join (code-gen & iterator) is designed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r629027318 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: @maropu - No I think we need buffer anyway. The buffered rows has same join keys with current streamed row. But there can be multiple followed streamed rows having same join keys, as the buffered rows. Even though buffered rows cannot match condition with current streamed rows, they may match condition with followed streamed rows. I think this is how current sort merge join (code-gen & iterator) is designed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue
SparkQA removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836035623 **[Test build #138313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138313/testReport)** for PR 32399 at commit [`c6aa4c4`](https://github.com/apache/spark/commit/c6aa4c4ccc8b9103314d5efea148b71e19a560d4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
viirya commented on a change in pull request #32487: URL: https://github.com/apache/spark/pull/32487#discussion_r629025675 ## File path: dev/create-release/release-build.sh ## @@ -210,6 +210,8 @@ if [[ "$1" == "package" ]]; then PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py +export MAVEN_OPTS="-Xmx12000m" Review comment: ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still co
AmplabJenkins removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836109653 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138313/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue r
AmplabJenkins commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836109653 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138313/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836108663 **[Test build #138313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138313/testReport)** for PR 32399 at commit [`c6aa4c4`](https://github.com/apache/spark/commit/c6aa4c4ccc8b9103314d5efea148b71e19a560d4). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836106608 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138311/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836106608 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138311/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState
wangyum commented on a change in pull request #32410: URL: https://github.com/apache/spark/pull/32410#discussion_r629020979 ## File path: sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java ## @@ -141,7 +141,7 @@ public void open(Map sessionConfMap) throws HiveSQLException { sessionState = new SessionState(hiveConf, username); sessionState.setUserIpAddress(ipAddress); sessionState.setIsHiveServerQuery(true); -SessionState.start(sessionState); +SessionState.setCurrentSessionState(sessionState); Review comment: Yes. It is safe when use `ADD JARS`. We have disabled creating these directories for more than a year with the following changes(`HiveConf.ConfVars.WITHSCRATCHDIR=false`): ![image](https://user-images.githubusercontent.com/5399861/116785447-312cc500-aacc-11eb-8dff-6ae75fbbc4d7.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835906957 **[Test build #138311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138311/testReport)** for PR 32473 at commit [`34d0511`](https://github.com/apache/spark/commit/34d05113d307395bd1c1449651e09a8285fd0c6e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #25911: [SPARK-29223][SQL][SS] Enable global timestamp per topic while specifying offset by timestamp in Kafka source
HeartSaVioR commented on pull request #25911: URL: https://github.com/apache/spark/pull/25911#issuecomment-836089685 I see actual customer's demand on this; "a" topic has 100+ partitions and it's weird to let them craft json which contains 100+ partitions for the same timestamp. Flink already does the thing; Flink uses global value across partitions for earliest/latest/timestamp, while it allows to set exact offset per partition. https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-consumers-start-position-configuration ``` final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); FlinkKafkaConsumer myConsumer = new FlinkKafkaConsumer<>(...); myConsumer.setStartFromEarliest(); // start from the earliest record possible myConsumer.setStartFromLatest(); // start from the latest record myConsumer.setStartFromTimestamp(...); // start from specified epoch timestamp (milliseconds) myConsumer.setStartFromGroupOffsets(); // the default behaviour ``` ``` Map specificStartOffsets = new HashMap<>(); specificStartOffsets.put(new KafkaTopicPartition("myTopic", 0), 23L); specificStartOffsets.put(new KafkaTopicPartition("myTopic", 1), 31L); specificStartOffsets.put(new KafkaTopicPartition("myTopic", 2), 43L); myConsumer.setStartFromSpecificOffsets(specificStartOffsets); ``` Given this PR is stale, I'll rebase this with master and raise the PR again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836088555 **[Test build #138311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138311/testReport)** for PR 32473 at commit [`34d0511`](https://github.com/apache/spark/commit/34d05113d307395bd1c1449651e09a8285fd0c6e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32480: [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin
c21 commented on pull request #32480: URL: https://github.com/apache/spark/pull/32480#issuecomment-836086921 Thank you @maropu for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
beliefer commented on a change in pull request #32442: URL: https://github.com/apache/spark/pull/32442#discussion_r629016341 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte-ddl.sql ## @@ -0,0 +1,65 @@ +-- Test data. +CREATE NAMESPACE IF NOT EXISTS query_ddl_namespace; +USE NAMESPACE query_ddl_namespace; +CREATE TABLE test_show_tables(a INT, b STRING, c INT) using parquet; +CREATE TABLE test_show_table_properties (a INT, b STRING, c INT) USING parquet TBLPROPERTIES('p1'='v1', 'p2'='v2'); +CREATE TABLE test_show_partitions(a String, b Int, c String, d String) USING parquet PARTITIONED BY (c, d); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=1); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=2); +ALTER TABLE test_show_partitions ADD PARTITION (c='Cn', d=1); +CREATE VIEW view_1 AS SELECT * FROM test_show_tables; +CREATE VIEW view_2 AS SELECT * FROM test_show_tables WHERE c=1; +CREATE TEMPORARY VIEW test_show_views(e int) USING parquet; +CREATE GLOBAL TEMP VIEW test_global_show_views AS SELECT 1 as col1; + +-- SHOW NAMESPACES +SHOW NAMESPACES; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s WHERE namespace = 'query_ddl_namespace'; +WITH s(n) AS (SHOW NAMESPACES) SELECT * FROM s WHERE n = 'query_ddl_namespace'; + +-- SHOW TABLES +SHOW TABLES; +WITH s AS (SHOW TABLES) SELECT * FROM s; +WITH s AS (SHOW TABLES) SELECT * FROM s WHERE tableName = 'test_show_tables'; +WITH s(ns, tn, t) AS (SHOW TABLES) SELECT * FROM s WHERE tn = 'test_show_tables'; Review comment: OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang closed pull request #32374: [WIP][SPARK-35253][BUILD][SQL] Upgrade Janino from 3.0.16 to 3.1.3
LuciferYang closed pull request #32374: URL: https://github.com/apache/spark/pull/32374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #32374: [WIP][SPARK-35253][BUILD][SQL] Upgrade Janino from 3.0.16 to 3.1.3
LuciferYang commented on pull request #32374: URL: https://github.com/apache/spark/pull/32374#issuecomment-836082137 close this because SPARK-35253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #32455: [SPARK-35253][SQL][BUILD] Bump up the janino version to v3.1.4
LuciferYang commented on a change in pull request #32455: URL: https://github.com/apache/spark/pull/32455#discussion_r629014929 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1434,9 +1435,10 @@ object CodeGenerator extends Logging { private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): ByteCodeStats = { // First retrieve the generated classes. val classes = { - val resultField = classOf[SimpleCompiler].getDeclaredField("result") - resultField.setAccessible(true) - val loader = resultField.get(evaluator).asInstanceOf[ByteArrayClassLoader] + val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc") Review comment: @maropu Can we directly use `evaluator.getBytecodes.asScala` instead of line 1438 ~ line 1445? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still co
AmplabJenkins removed a comment on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836069987 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42835/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue r
AmplabJenkins commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836069987 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42835/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #32350: [SPARK-35231][SQL] logical.Range override maxRowsPerPartition
zhengruifeng commented on pull request #32350: URL: https://github.com/apache/spark/pull/32350#issuecomment-836067509 Thank you so much! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836058502 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/42835/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
maropu commented on a change in pull request #32487: URL: https://github.com/apache/spark/pull/32487#discussion_r629004607 ## File path: dev/create-release/release-build.sh ## @@ -210,6 +210,8 @@ if [[ "$1" == "package" ]]; then PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py +export MAVEN_OPTS="-Xmx12000m" Review comment: nit: we can say `-Xmx12g`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
huaxingao commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836051980 @dongjoon-hyun > Shall we change the grouping in order see the trend according to the block size? Sorry, I just saw your comment. I guess it might be a little better to pair up the results of `Without bloom filter` and `With bloom filter` so it's easier to see the improvement for bloom filter? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on a change in pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
huaxingao commented on a change in pull request #32473: URL: https://github.com/apache/spark/pull/32473#discussion_r629004056 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BloomFilterBenchmark.scala ## @@ -81,8 +80,57 @@ object BloomFilterBenchmark extends SqlBasedBenchmark { } } + private def writeParquetBenchmark(): Unit = { +withTempPath { dir => + val path = dir.getCanonicalPath + + runBenchmark(s"Parquet Write") { +val benchmark = new Benchmark(s"Write ${scaleFactor}M rows", N, output = output) +benchmark.addCase("Without bloom filter") { _ => + df.write.mode("overwrite").parquet(path + "/withoutBF") +} +benchmark.addCase("With bloom filter") { _ => + df.write.mode("overwrite") +.option(ParquetOutputFormat.BLOOM_FILTER_ENABLED + "#value", true) +.parquet(path + "/withBF") +} +benchmark.run() + } +} + } + + private def readParquetBenchmark(): Unit = { +val blockSizes = Seq(512 * 1024, 1024 * 1024, 2 * 1024 * 1024, 3 * 1024 * 1024, + 4 * 1024 * 1024, 5 * 1024 * 1024, 6 * 1024 * 1024, 7 * 1024 * 1024, + 8 * 1024 * 1024, 9 * 1024 * 1024, 10 * 1024 * 1024) +for (blocksize <- blockSizes) { + withTempPath { dir => +val path = dir.getCanonicalPath + +df.write.option("parquet.block.size", blocksize).parquet(path + "/withoutBF") Review comment: @wangyum Sorry, I am new to parquet. Somehow I didn't see parquet has compression size, seems only ORC has `orc.compress.size`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32399: [SPARK-35271][ML][PYSPARK] Fix: After CrossValidator/TrainValidationSplit fit raised error, some backgroud threads may still continue run or
SparkQA commented on pull request #32399: URL: https://github.com/apache/spark/pull/32399#issuecomment-836035623 **[Test build #138313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138313/testReport)** for PR 32399 at commit [`c6aa4c4`](https://github.com/apache/spark/commit/c6aa4c4ccc8b9103314d5efea148b71e19a560d4). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836035119 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138310/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836035114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins removed a comment on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-836035119 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138310/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-836035114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32480: [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin
maropu commented on pull request #32480: URL: https://github.com/apache/spark/pull/32480#issuecomment-836019661 Thank you, @c21. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #32480: [SPARK-35354][SQL] Replace BaseJoinExec with ShuffledJoin in CoalesceBucketsInJoin
maropu closed pull request #32480: URL: https://github.com/apache/spark/pull/32480 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835996955 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
wangyum commented on a change in pull request #32442: URL: https://github.com/apache/spark/pull/32442#discussion_r628987144 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte-ddl.sql ## @@ -0,0 +1,65 @@ +-- Test data. +CREATE NAMESPACE IF NOT EXISTS query_ddl_namespace; +USE NAMESPACE query_ddl_namespace; +CREATE TABLE test_show_tables(a INT, b STRING, c INT) using parquet; +CREATE TABLE test_show_table_properties (a INT, b STRING, c INT) USING parquet TBLPROPERTIES('p1'='v1', 'p2'='v2'); +CREATE TABLE test_show_partitions(a String, b Int, c String, d String) USING parquet PARTITIONED BY (c, d); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=1); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=2); +ALTER TABLE test_show_partitions ADD PARTITION (c='Cn', d=1); +CREATE VIEW view_1 AS SELECT * FROM test_show_tables; +CREATE VIEW view_2 AS SELECT * FROM test_show_tables WHERE c=1; +CREATE TEMPORARY VIEW test_show_views(e int) USING parquet; +CREATE GLOBAL TEMP VIEW test_global_show_views AS SELECT 1 as col1; + +-- SHOW NAMESPACES +SHOW NAMESPACES; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s WHERE namespace = 'query_ddl_namespace'; +WITH s(n) AS (SHOW NAMESPACES) SELECT * FROM s WHERE n = 'query_ddl_namespace'; + +-- SHOW TABLES +SHOW TABLES; +WITH s AS (SHOW TABLES) SELECT * FROM s; +WITH s AS (SHOW TABLES) SELECT * FROM s WHERE tableName = 'test_show_tables'; +WITH s(ns, tn, t) AS (SHOW TABLES) SELECT * FROM s WHERE tn = 'test_show_tables'; Review comment: Could we add more tests? For example: ```sql WITH s(ns, tn, t) AS (SHOW TABLES) SELECT tn FROM s; WITH s(ns, tn, t) AS (SHOW TABLES) SELECT tn FROM s ORDER BY rn; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628986405 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: btw, in the current generated code, it seems `conditionCheck` is evaluated outside `findNextJoinRows`. We cannot evaluate it inside `findNextJoinRows` to avoid putting unmached rows in `matches`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628986405 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: btw, in the current generated code, it seems `conditionCheck` is evaluated outside `findNextJoinRows`. We cannot evaluate it inside `findNextJoinRows` to avoid actual putting unmached rows in `matches`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
wangyum commented on a change in pull request #32442: URL: https://github.com/apache/spark/pull/32442#discussion_r628980328 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -375,8 +363,18 @@ ctes : WITH namedQuery (',' namedQuery)* ; +informationQuery +: SHOW (DATABASES | NAMESPACES) ((FROM | IN) multipartIdentifier)? (LIKE? pattern=STRING)? #showNamespaces +| SHOW TABLES ((FROM | IN) multipartIdentifier)? (LIKE? pattern=STRING)? #showTables Review comment: Why do not support `SHOW TABLE EXTENDED`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628977186 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( + |scala.collection.Iterator streamedIter, + |scala.collection.Iterator bufferedIter) { + | $streamedRow = null; | int comp = 0; - | while ($leftRow == null) { - |if (!leftIter.hasNext()) return false; - |$leftRow = (InternalRow) leftIter.next(); - |${leftKeyVars.map(_.code).mkString("\n")} - |if ($leftAnyNull) { - | $leftRow = null; - | continue; + | while ($streamedRow == null) { + |if (!streamedIter.hasNext()) return false; + |$streamedRow = (InternalRow) streamedIter.next(); + |${streamedKeyVars.map(_.code).mkString("\n")} + |if ($streamedAnyNull) { + | $handleStreamedAnyNull |} |if (!$matches.isEmpty()) { - | ${genComparison(ctx, leftKeyVars, matchedKeyVars)} + | ${genComparison(ctx, streamedKeyVars, matchedKeyVars)} | if (comp == 0) { |return true; | } | $matches.clear(); |} | |do { - | if ($rightRow == null) { - |if (!rightIter.hasNext()) { + | if ($bufferedRow == null) { + |if (!bufferedIter.hasNext()) { | ${matchedKeyVars.map(_.code).mkString("\n")} | return !$matches.isEmpty(); |} - |$rightRow = (InternalRow) rightIter.next(); - |${rightKeyTmpVars.map(_.code).mkString("\n")} - |if ($rightAnyNull) { - | $rightRow = null; + |$bufferedRow = (InternalRow) bufferedIter.next(); + |${bufferedKeyTmpVars.map(_.code).mkString("\n")} + |if ($bufferedAnyNull) { + | $bufferedRow = null; | continue; |} - |${rightKeyVars.map(_.code).mkString("\n")} + |${bufferedKeyVars.map(_.code).mkString("\n")} | } - | ${genComparison(ctx, leftKeyVars, rightKeyVars)} + | ${genComparison(ctx, streamedKeyVars, bufferedKeyVars)} | if (comp > 0) { - |$rightRow = null; + |$bufferedRow = null; | } else if (comp < 0) { |if (!$matches.isEmpty()) { | ${matchedKeyVars.map(_.code).mkString("\n")} | return true; + |} else { + | $handleStreamedLessThanBuffered |} - |$leftRow = null; | } else { - |
[GitHub] [spark] c21 commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628976694 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -554,67 +604,118 @@ case class SortMergeJoinExec( override def doProduce(ctx: CodegenContext): String = { // Inline mutable state since not many join operations in a task -val leftInput = ctx.addMutableState("scala.collection.Iterator", "leftInput", +val streamedInput = ctx.addMutableState("scala.collection.Iterator", "streamedInput", v => s"$v = inputs[0];", forceInline = true) -val rightInput = ctx.addMutableState("scala.collection.Iterator", "rightInput", +val bufferedInput = ctx.addMutableState("scala.collection.Iterator", "bufferedInput", v => s"$v = inputs[1];", forceInline = true) -val (leftRow, matches) = genScanner(ctx) +val (streamedRow, matches) = genScanner(ctx) // Create variables for row from both sides. -val (leftVars, leftVarDecl) = createLeftVars(ctx, leftRow) -val rightRow = ctx.freshName("rightRow") -val rightVars = createRightVar(ctx, rightRow) +val (streamedVars, streamedVarDecl) = createStreamedVars(ctx, streamedRow) +val bufferedRow = ctx.freshName("bufferedRow") +val bufferedVars = genBuildSideVars(ctx, bufferedRow, bufferedPlan) val iterator = ctx.freshName("iterator") val numOutput = metricTerm(ctx, "numOutputRows") -val (beforeLoop, condCheck) = if (condition.isDefined) { +val resultVars = joinType match { + case _: InnerLike | LeftOuter => +streamedVars ++ bufferedVars + case RightOuter => +bufferedVars ++ streamedVars + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.doProduce should not take $x as the JoinType") +} + +val (beforeLoop, conditionCheck) = if (condition.isDefined) { // Split the code of creating variables based on whether it's used by condition or not. val loaded = ctx.freshName("loaded") - val (leftBefore, leftAfter) = splitVarsByCondition(left.output, leftVars) - val (rightBefore, rightAfter) = splitVarsByCondition(right.output, rightVars) + val (streamedBefore, streamedAfter) = splitVarsByCondition(streamedOutput, streamedVars) + val (bufferedBefore, bufferedAfter) = splitVarsByCondition(bufferedOutput, bufferedVars) // Generate code for condition - ctx.currentVars = leftVars ++ rightVars + ctx.currentVars = resultVars val cond = BindReferences.bindReference(condition.get, output).genCode(ctx) // evaluate the columns those used by condition before loop - val before = s""" + val before = +s""" |boolean $loaded = false; - |$leftBefore + |$streamedBefore """.stripMargin - val checking = s""" - |$rightBefore - |${cond.code} - |if (${cond.isNull} || !${cond.value}) continue; - |if (!$loaded) { - | $loaded = true; - | $leftAfter - |} - |$rightAfter - """.stripMargin + val checking = +s""" + |$bufferedBefore + |if ($bufferedRow != null) { + | ${cond.code} + | if (${cond.isNull} || !${cond.value}) { + |continue; + | } + |} + |if (!$loaded) { + | $loaded = true; + | $streamedAfter + |} + |$bufferedAfter + """.stripMargin (before, checking) } else { - (evaluateVariables(leftVars), "") + (evaluateVariables(streamedVars), "") } val thisPlan = ctx.addReferenceObj("plan", this) val eagerCleanup = s"$thisPlan.cleanupResources();" -s""" - |while (findNextInnerJoinRows($leftInput, $rightInput)) { - | ${leftVarDecl.mkString("\n")} - | ${beforeLoop.trim} - | scala.collection.Iterator $iterator = $matches.generateIterator(); - | while ($iterator.hasNext()) { - |InternalRow $rightRow = (InternalRow) $iterator.next(); - |${condCheck.trim} - |$numOutput.add(1); - |${consume(ctx, leftVars ++ rightVars)} - | } - | if (shouldStop()) return; - |} - |$eagerCleanup +lazy val innerJoin = + s""" + |while (findNextJoinRows($streamedInput, $bufferedInput)) { + | ${streamedVarDecl.mkString("\n")} + | ${beforeLoop.trim} + | scala.collection.Iterator $iterator = $matches.generateIterator(); + | while ($iterator.hasNext()) { + |InternalRow $bufferedRow = (InternalRow) $iterator.next(); + |${conditionCheck.trim} + |$numOutput.add(1); + |${consume(ctx, resultVars)} + | } + | if (shouldStop()) return; + |} + |$eagerCleanup """.stripMargin + +lazy
[GitHub] [spark] wangyum commented on a change in pull request #32442: [SPARK-35283][SQL] Support query some DDL with CTES
wangyum commented on a change in pull request #32442: URL: https://github.com/apache/spark/pull/32442#discussion_r628976181 ## File path: sql/core/src/test/resources/sql-tests/inputs/cte-ddl.sql ## @@ -0,0 +1,65 @@ +-- Test data. +CREATE NAMESPACE IF NOT EXISTS query_ddl_namespace; +USE NAMESPACE query_ddl_namespace; +CREATE TABLE test_show_tables(a INT, b STRING, c INT) using parquet; +CREATE TABLE test_show_table_properties (a INT, b STRING, c INT) USING parquet TBLPROPERTIES('p1'='v1', 'p2'='v2'); +CREATE TABLE test_show_partitions(a String, b Int, c String, d String) USING parquet PARTITIONED BY (c, d); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=1); +ALTER TABLE test_show_partitions ADD PARTITION (c='Us', d=2); +ALTER TABLE test_show_partitions ADD PARTITION (c='Cn', d=1); +CREATE VIEW view_1 AS SELECT * FROM test_show_tables; +CREATE VIEW view_2 AS SELECT * FROM test_show_tables WHERE c=1; +CREATE TEMPORARY VIEW test_show_views(e int) USING parquet; +CREATE GLOBAL TEMP VIEW test_global_show_views AS SELECT 1 as col1; + +-- SHOW NAMESPACES +SHOW NAMESPACES; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s; +WITH s AS (SHOW NAMESPACES) SELECT * FROM s WHERE namespace = 'query_ddl_namespace'; +WITH s(n) AS (SHOW NAMESPACES) SELECT * FROM s WHERE n = 'query_ddl_namespace'; + +-- SHOW TABLES +SHOW TABLES; +WITH s AS (SHOW TABLES) SELECT * FROM s; +WITH s AS (SHOW TABLES) SELECT * FROM s WHERE tableName = 'test_show_tables'; +WITH s(ns, tn, t) AS (SHOW TABLES) SELECT * FROM s WHERE tn = 'test_show_tables'; + +-- SHOW TBLPROPERTIES +SHOW TBLPROPERTIES test_show_table_properties; +WITH s AS (SHOW TBLPROPERTIES test_show_table_properties) SELECT * FROM s; +WITH s AS (SHOW TBLPROPERTIES test_show_table_properties) SELECT * FROM s WHERE key = 'p1'; +WITH s(k, v) AS (SHOW TBLPROPERTIES test_show_table_properties) SELECT * FROM s WHERE k = 'p1'; + +-- SHOW PARTITIONS +SHOW PARTITIONS test_show_partitions; +WITH s AS (SHOW PARTITIONS test_show_partitions) SELECT * FROM s; +WITH s AS (SHOW PARTITIONS test_show_partitions) SELECT * FROM s WHERE partition = 'c=Us/d=1'; +WITH s(p) AS (SHOW PARTITIONS test_show_partitions) SELECT * FROM s WHERE p = 'c=Us/d=1'; + +-- SHOW COLUMNS +SHOW COLUMNS in test_show_tables; +WITH s AS (SHOW COLUMNS in test_show_tables) SELECT * FROM s; +WITH s AS (SHOW COLUMNS in test_show_tables) SELECT * FROM s WHERE col_name = 'a'; +WITH s(c) AS (SHOW COLUMNS in test_show_tables) SELECT * FROM s WHERE c = 'a'; + +-- SHOW VIEWS +SHOW VIEWS; +WITH s AS (SHOW VIEWS) SELECT * FROM s; +WITH s AS (SHOW VIEWS) SELECT * FROM s WHERE viewName = 'test_show_views'; +WITH s(ns, vn, t) AS (SHOW VIEWS) SELECT * FROM s WHERE vn = 'test_show_views'; + +-- SHOW FUNCTIONS +WITH s AS (SHOW FUNCTIONS) SELECT * FROM s LIMIT 3; +WITH s AS (SHOW FUNCTIONS) SELECT * FROM s WHERE function LIKE 'an%'; +WITH s(f) AS (SHOW FUNCTIONS) SELECT * FROM s WHERE f LIKE 'an%'; + +-- Clean Up +DROP VIEW global_temp.test_global_show_views; +DROP VIEW test_show_views; +DROP VIEW view_2; +DROP VIEW view_1; +DROP TABLE test_show_partitions; +DROP TABLE test_show_table_properties; +DROP TABLE test_show_tables; +USE default; +DROP NAMESPACE query_ddl_namespace; Review comment: Please add a newline character. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835879367 **[Test build #138309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138309/testReport)** for PR 32473 at commit [`10d7a97`](https://github.com/apache/spark/commit/10d7a977391d659d2060ba596c55d0334754866c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628974305 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: > In the outer case, a return value is not used? Yes. Otherwise it's very hard to re-use code in findNextJoinRows. I can further make more change to not return anything for findNextJoinRows in case it's an outer join. Do we want to do that? okay, the current one looks fine. Let's just wait for a @cloud-fan comment here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835985975 **[Test build #138309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138309/testReport)** for PR 32473 at commit [`10d7a97`](https://github.com/apache/spark/commit/10d7a977391d659d2060ba596c55d0334754866c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628972459 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: > Why we don't need to put all the rows? We anyway need to evaluate all the rows on buffered side for join, right? Oh, my bad. ya, you're right. I misunderstood it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628969762 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; Review comment: I see. Could you leave some comments about it there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
SparkQA removed a comment on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-835906899 **[Test build #138310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138310/testReport)** for PR 32487 at commit [`2d27589`](https://github.com/apache/spark/commit/2d275891147341ef233ac2082e973a0e98660832). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
SparkQA commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-835979912 **[Test build #138310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138310/testReport)** for PR 32487 at commit [`2d27589`](https://github.com/apache/spark/commit/2d275891147341ef233ac2082e973a0e98660832). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on pull request #32476: URL: https://github.com/apache/spark/pull/32476#issuecomment-835977883 > @maropu - JoinBenchmark has only inner sort merge join, but not left/right outer join. So this PR does not affect the result of benchmark as it is. Shall we have a followup PR to update the join benchmark? I wanted to add other more test cases in JoinBenchmark as well. Ah, okay. sgtm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on pull request #32476: URL: https://github.com/apache/spark/pull/32476#issuecomment-835976988 @maropu - `JoinBenchmark` has only inner sort merge join, but not left/right outer join. So this PR does not affect the result of benchmark as it is. Shall we have a followup PR to update the join benchmark? I wanted to add other more test cases in `JoinBenchmark` as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
wangyum commented on a change in pull request #32473: URL: https://github.com/apache/spark/pull/32473#discussion_r628967020 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BloomFilterBenchmark.scala ## @@ -81,8 +80,57 @@ object BloomFilterBenchmark extends SqlBasedBenchmark { } } + private def writeParquetBenchmark(): Unit = { +withTempPath { dir => + val path = dir.getCanonicalPath + + runBenchmark(s"Parquet Write") { +val benchmark = new Benchmark(s"Write ${scaleFactor}M rows", N, output = output) +benchmark.addCase("Without bloom filter") { _ => + df.write.mode("overwrite").parquet(path + "/withoutBF") +} +benchmark.addCase("With bloom filter") { _ => + df.write.mode("overwrite") +.option(ParquetOutputFormat.BLOOM_FILTER_ENABLED + "#value", true) +.parquet(path + "/withBF") +} +benchmark.run() + } +} + } + + private def readParquetBenchmark(): Unit = { +val blockSizes = Seq(512 * 1024, 1024 * 1024, 2 * 1024 * 1024, 3 * 1024 * 1024, + 4 * 1024 * 1024, 5 * 1024 * 1024, 6 * 1024 * 1024, 7 * 1024 * 1024, + 8 * 1024 * 1024, 9 * 1024 * 1024, 10 * 1024 * 1024) +for (blocksize <- blockSizes) { + withTempPath { dir => +val path = dir.getCanonicalPath + +df.write.option("parquet.block.size", blocksize).parquet(path + "/withoutBF") Review comment: Could we use the same value for block size and compression size? Please see how we did it in [FilterPushdownBenchmark](https://github.com/apache/spark/blob/7158e7f986630d4f67fb49a206d408c5f4384991/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala#L61-L62). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
c21 commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628966219 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -353,12 +353,37 @@ case class SortMergeJoinExec( } } - override def supportCodegen: Boolean = { -joinType.isInstanceOf[InnerLike] + private lazy val (streamedPlan, bufferedPlan) = joinType match { Review comment: @maropu - yes, this is used for code-gen only. Note here we only pattern match inner/left outer/right outer join, so it will throw exception with `val` for other join types. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; Review comment: Wanted to avoid `clear()` if `isEmpty()` is true. `ExternalAppendOnlyUnsafeRowArray.isEmpty()` is very cheap but `clear()` sets multiple variables. ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: > For example, if there are too many matched duplicate rows in the buffered side, it seems we don't need to put all the rows in matches, right? Why we don't need to put all the rows? We anyway need to evaluate all the rows on buffered side for join, right? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the
[GitHub] [spark] wangyum commented on pull request #29642: [SPARK-32792][SQL] Improve Parquet In filter pushdown
wangyum commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-835972741 @dongjoon-hyun This pr only improve the `In` predicate. I have added the improvement part to PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on pull request #32476: URL: https://github.com/apache/spark/pull/32476#issuecomment-835970308 Could you update the `JoinBenchmark` results, too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32476: [SPARK-35349][SQL] Add code-gen for left/right outer sort merge join
maropu commented on a change in pull request #32476: URL: https://github.com/apache/spark/pull/32476#discussion_r628960873 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +443,140 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) + +// Handle the case when streamed rows has any NULL keys. +val handleStreamedAnyNull = joinType match { + case _: InnerLike => +// Skip streamed row. +s""" + |$streamedRow = null; + |continue; + """.stripMargin + case LeftOuter | RightOuter => +// Eagerly return streamed row. +s""" + |if (!$matches.isEmpty()) { + | $matches.clear(); + |} + |return false; + """.stripMargin + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} -ctx.addNewFunction("findNextInnerJoinRows", +// Handle the case when streamed keys less than buffered keys. +val handleStreamedLessThanBuffered = joinType match { + case _: InnerLike => +// Skip streamed row. +s"$streamedRow = null;" + case LeftOuter | RightOuter => +// Eagerly return with streamed row. +"return false;" + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.genScanner should not take $x as the JoinType") +} + +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( Review comment: In the outer case, a return value is not used? ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -554,67 +604,118 @@ case class SortMergeJoinExec( override def doProduce(ctx: CodegenContext): String = { // Inline mutable state since not many join operations in a task -val leftInput = ctx.addMutableState("scala.collection.Iterator", "leftInput", +val streamedInput = ctx.addMutableState("scala.collection.Iterator", "streamedInput", v => s"$v = inputs[0];", forceInline = true) -val rightInput = ctx.addMutableState("scala.collection.Iterator", "rightInput", +val bufferedInput = ctx.addMutableState("scala.collection.Iterator", "bufferedInput", v => s"$v = inputs[1];", forceInline = true) -val (leftRow, matches) = genScanner(ctx) +val (streamedRow, matches) = genScanner(ctx) // Create variables for row from both sides. -val (leftVars, leftVarDecl) = createLeftVars(ctx, leftRow) -val rightRow = ctx.freshName("rightRow") -val rightVars = createRightVar(ctx, rightRow) +val (streamedVars, streamedVarDecl) = createStreamedVars(ctx, streamedRow) +val bufferedRow = ctx.freshName("bufferedRow") +val bufferedVars = genBuildSideVars(ctx, bufferedRow, bufferedPlan) val iterator = ctx.freshName("iterator") val numOutput = metricTerm(ctx, "numOutputRows") -val (beforeLoop, condCheck) = if (condition.isDefined) { +val resultVars = joinType match { + case _: InnerLike | LeftOuter => +streamedVars ++ bufferedVars + case RightOuter => +bufferedVars ++ streamedVars + case x => +throw new IllegalArgumentException( + s"SortMergeJoin.doProduce should not take $x as the JoinType") +} + +val (beforeLoop, conditionCheck) = if (condition.isDefined) { // Split the code of creating variables based on whether it's used by condition or not. val loaded = ctx.freshName("loaded") - val (leftBefore, leftAfter) = splitVarsByCondition(left.output, leftVars) - val (rightBefore, rightAfter) = splitVarsByCondition(right.output, rightVars) + val (streamedBefore, streamedAfter) = splitVarsByCondition(streamedOutput, streamedVars) + val (bufferedBefore, bufferedAfter) = splitVarsByCondition(bufferedOutput, bufferedVars) // Generate code for condition - ctx.currentVars = leftVars ++ rightVars + ctx.currentVars = resultVars val cond = BindReferences.bindReference(condition.get, output).genCode(ctx) // evaluate the columns those used by condition before loop - val before = s""" + val before
[GitHub] [spark] wangyum commented on a change in pull request #29642: [SPARK-32792][SQL] Improve Parquet In filter pushdown
wangyum commented on a change in pull request #29642: URL: https://github.com/apache/spark/pull/29642#discussion_r628965380 ## File path: sql/core/benchmarks/FilterPushdownBenchmark-jdk11-results.txt ## @@ -2,669 +2,669 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized10512 10572 58 1.5 668.4 1.0X -Parquet Vectorized (Pushdown) 596621 19 26.4 37.9 17.6X -Native ORC Vectorized 8555 8723 97 1.8 543.9 1.2X -Native ORC Vectorized (Pushdown)592609 11 26.6 37.7 17.8X +Parquet Vectorized 9788 10231 259 1.6 622.3 1.0X +Parquet Vectorized (Pushdown) 493536 29 31.9 31.3 19.9X +Native ORC Vectorized 6487 6575 137 2.4 412.4 1.5X +Native ORC Vectorized (Pushdown)436447 14 36.1 27.7 22.4X -OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Select 0 string row ('7864320' < value < '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative --- -Parquet Vectorized 10406 10461 50 1.5 661.6 1.0X -Parquet Vectorized (Pushdown) 619 641 22 25.4 39.4 16.8X -Native ORC Vectorized 8787 8834 57 1.8 558.6 1.2X -Native ORC Vectorized (Pushdown) 592 608 11 26.6 37.6 17.6X +Parquet Vectorized9861 9880 16 1.6 626.9 1.0X +Parquet Vectorized (Pushdown) 507 529 21 31.0 32.3 19.4X +Native ORC Vectorized 6871 6938 63 2.3 436.8 1.4X +Native ORC Vectorized (Pushdown) 453 470 13 34.7 28.8 21.8X -OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Select 1 string row (value = '7864320'): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized10632 10694 60 1.5 676.0 1.0X -Parquet Vectorized (Pushdown) 608635 22 25.9 38.6 17.5X -Native ORC Vectorized 8790 8838 37 1.8 558.9 1.2X -Native ORC Vectorized (Pushdown)559584 22 28.1 35.5 19.0X +Parquet Vectorized10228 10471 167 1.5 650.3 1.0X +Parquet Vectorized (Pushdown) 511519 5 30.8 32.5 20.0X +Native ORC Vectorized 6700 6865 119 2.3 426.0 1.5X +Native ORC Vectorized (Pushdown)436454 12 36.1 27.7 23.5X -OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK
[GitHub] [spark] SparkQA commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
SparkQA commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835964738 **[Test build #138312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138312/testReport)** for PR 32473 at commit [`21cc2ac`](https://github.com/apache/spark/commit/21cc2ac907ffe9256942d818663ce225d1a1b992). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #29642: [SPARK-32792][SQL] Improve Parquet In filter pushdown
wangyum commented on a change in pull request #29642: URL: https://github.com/apache/spark/pull/29642#discussion_r628964580 ## File path: sql/core/benchmarks/FilterPushdownBenchmark-jdk11-results.txt ## @@ -2,669 +2,669 @@ Pushdown for many distinct value case -OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 11.0.11+9-LTS on Linux 5.4.0-1046-azure +Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz Select 0 string row (value IS NULL): Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -Parquet Vectorized10512 10572 58 1.5 668.4 1.0X -Parquet Vectorized (Pushdown) 596621 19 26.4 37.9 17.6X -Native ORC Vectorized 8555 8723 97 1.8 543.9 1.2X -Native ORC Vectorized (Pushdown)592609 11 26.6 37.7 17.8X +Parquet Vectorized 9788 10231 259 1.6 622.3 1.0X +Parquet Vectorized (Pushdown) 493536 29 31.9 31.3 19.9X +Native ORC Vectorized 6487 6575 137 2.4 412.4 1.5X +Native ORC Vectorized (Pushdown)436447 14 36.1 27.7 22.4X Review comment: No. Github action runs on different machines, there is a performance difference between them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe
github-actions[bot] commented on pull request #31296: URL: https://github.com/apache/spark/pull/31296#issuecomment-835957576 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins removed a comment on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-835929791 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42832/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins removed a comment on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835929789 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42833/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32473: [SPARK-35345][SQL] Add Parquet tests to BloomFilterBenchmark
AmplabJenkins commented on pull request #32473: URL: https://github.com/apache/spark/pull/32473#issuecomment-835929789 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42833/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
AmplabJenkins commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-835929791 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/42832/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
viirya commented on a change in pull request #32487: URL: https://github.com/apache/spark/pull/32487#discussion_r628955847 ## File path: dev/create-release/release-build.sh ## @@ -210,6 +210,8 @@ if [[ "$1" == "package" ]]; then PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py +export MAVEN_OPTS="-Xmx12000m" Review comment: okay -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
dongjoon-hyun commented on pull request #32487: URL: https://github.com/apache/spark/pull/32487#issuecomment-835927141 Also, cc @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32487: [SPARK-35358][BUILD] Increase maximum Java heap used for release build to avoid OOM
dongjoon-hyun commented on a change in pull request #32487: URL: https://github.com/apache/spark/pull/32487#discussion_r628955769 ## File path: dev/create-release/release-build.sh ## @@ -210,6 +210,8 @@ if [[ "$1" == "package" ]]; then PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -e "s/-/./" -e "s/SNAPSHOT/dev0/" -e "s/preview/dev/"` echo "__version__='$PYSPARK_VERSION'" > python/pyspark/version.py +export MAVEN_OPTS="-Xmx12000m" Review comment: Can we have this globally outside of `if` statement? Then, it looks like we need only one line addition. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org