[GitHub] [spark] imback82 commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
imback82 commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-841023379 @cloud-fan this is a PR to attempt to migrate the remaining `ALTER TABLE` commands to the new resolution framework. Previous attempt was reverted due to few concerns (https://github.com/apache/spark/pull/27327). Please let me know what you think about the new changes. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
AmplabJenkins removed a comment on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841022760 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
AmplabJenkins commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841022760 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138540/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
SparkQA removed a comment on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-840998906 **[Test build #138540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)** for PR 32544 at commit [`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
AmplabJenkins removed a comment on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-841022160 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43063/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
SparkQA commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841022352 **[Test build #138540 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)** for PR 32544 at commit [`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
AmplabJenkins commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-841022160 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43063/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
SparkQA commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-841022133 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
c21 commented on a change in pull request #32547: URL: https://github.com/apache/spark/pull/32547#discussion_r632297612 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -693,15 +701,12 @@ case class SortMergeJoinExec( |continue; | } |} - |if (!$loaded) { - | $loaded = true; - | $streamedAfter Review comment: @cloud-fan - good call for code size. Actually I just figured we don't need `streamedAfter` here for left anti, because we can skip streamed row when there's a match here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
c21 commented on a change in pull request #32547: URL: https://github.com/apache/spark/pull/32547#discussion_r632297612 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -693,15 +701,12 @@ case class SortMergeJoinExec( |continue; | } |} - |if (!$loaded) { - | $loaded = true; - | $streamedAfter Review comment: @cloud-fan - good call for code size. Actually I just figured we don't need `$loadStreamed` here for left anti, because we can skip streamed row when there's a match here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
c21 commented on a change in pull request #32544: URL: https://github.com/apache/spark/pull/32544#discussion_r632296521 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -141,7 +141,8 @@ case class BroadcastExchangeExec( longMetric("dataSize") += dataSize if (dataSize >= MAX_BROADCAST_TABLE_BYTES) { throw new SparkException( -s"Cannot broadcast the table that is larger than 8GB: ${dataSize >> 30} GB") +s"Cannot broadcast the table that is larger than" + + s" ${MAX_BROADCAST_TABLE_BYTES >> 30}GB: ${dataSize >> 30} GB") Review comment: sorry why `8GB` is not accurate? `BroadcastExchangeExec.MAX_BROADCAST_TABLE_BYTES` seems to be set statically as `8L << 30`, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
cloud-fan commented on a change in pull request #32547: URL: https://github.com/apache/spark/pull/32547#discussion_r632296238 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -693,15 +701,12 @@ case class SortMergeJoinExec( |continue; | } |} - |if (!$loaded) { - | $loaded = true; - | $streamedAfter Review comment: For left anti, `streamedAfter` will appear twice in the code. How large is it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-841017158 **[Test build #138551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138551/testReport)** for PR 32204 at commit [`8b48e6f`](https://github.com/apache/spark/commit/8b48e6fee602ce1df5345c677b81bf731b8a05ee). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
SparkQA commented on pull request #32411: URL: https://github.com/apache/spark/pull/32411#issuecomment-841017050 **[Test build #138550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138550/testReport)** for PR 32411 at commit [`67d5a78`](https://github.com/apache/spark/commit/67d5a78b0c8b51484ce607896797d0d6129b7ec6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
SparkQA commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-841016958 **[Test build #138549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138549/testReport)** for PR 32542 at commit [`9e058b1`](https://github.com/apache/spark/commit/9e058b10a3e61cfe702cab8a4b15726660e3da78). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
SparkQA commented on pull request #32547: URL: https://github.com/apache/spark/pull/32547#issuecomment-841016888 **[Test build #138547 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138547/testReport)** for PR 32547 at commit [`53364c9`](https://github.com/apache/spark/commit/53364c9076a922df8f5ff26e7a62e0bc871401e9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-841016640 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138529/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page
SparkQA commented on pull request #32546: URL: https://github.com/apache/spark/pull/32546#issuecomment-841016889 **[Test build #138548 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138548/testReport)** for PR 32546 at commit [`0ce01d0`](https://github.com/apache/spark/commit/0ce01d037a01e45b4b02ba3dbd68dfea940ecaee). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it
SparkQA commented on pull request #32548: URL: https://github.com/apache/spark/pull/32548#issuecomment-841016885 **[Test build #138546 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138546/testReport)** for PR 32548 at commit [`5a718fc`](https://github.com/apache/spark/commit/5a718fc08d828421b62ead9cfbe1b099ca3c76fd). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-841016640 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138529/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it
c21 commented on pull request #32548: URL: https://github.com/apache/spark/pull/32548#issuecomment-841016613 cc @cloud-fan, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it
c21 opened a new pull request #32548: URL: https://github.com/apache/spark/pull/32548 ### What changes were proposed in this pull request? This is a followup from discussion in https://github.com/apache/spark/pull/32495#discussion_r632283178 . The hardcoded function name `findNextJoinRows` is not a real problem now as we always do code generation for SMJ's children separately. But this change is to make it future proof in case this assumption changed in the future. ### Why are the changes needed? Fix the potential reliability issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable
AmplabJenkins removed a comment on pull request #32539: URL: https://github.com/apache/spark/pull/32539#issuecomment-841016254 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable
AmplabJenkins commented on pull request #32539: URL: https://github.com/apache/spark/pull/32539#issuecomment-841016254 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138536/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
AmplabJenkins removed a comment on pull request #32315: URL: https://github.com/apache/spark/pull/32315#issuecomment-841015747 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
AmplabJenkins removed a comment on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-841015745 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43065/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
AmplabJenkins removed a comment on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841015746 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43061/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
AmplabJenkins removed a comment on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-841015748 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138530/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
AmplabJenkins removed a comment on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-841015743 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43062/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
AmplabJenkins commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841015746 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43061/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
AmplabJenkins commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-841015748 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138530/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
AmplabJenkins commented on pull request #32315: URL: https://github.com/apache/spark/pull/32315#issuecomment-841015747 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138537/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
AmplabJenkins commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-841015745 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43065/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA removed a comment on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840916300 **[Test build #138529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138529/testReport)** for PR 32204 at commit [`2b6b066`](https://github.com/apache/spark/commit/2b6b066de16a3820cb89de21f31ffbe1b08e66e8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
AmplabJenkins commented on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-841015743 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43062/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-841015636 **[Test build #138529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138529/testReport)** for PR 32204 at commit [`2b6b066`](https://github.com/apache/spark/commit/2b6b066de16a3820cb89de21f31ffbe1b08e66e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AgeExample(birthday: Expression, child: Expression) extends RuntimeReplaceable ` * `class SessionExtensionsWithLoader extends SparkSessionExtensionsProvider ` * `class SessionExtensionsWithoutLoader extends SparkSessionExtensionsProvider ` * `case class AvroWrite(` * `case class KafkaWrite(` * `case class TryEval(child: Expression) extends UnaryExpression with NullIntolerant ` * `case class TryAdd(left: Expression, right: Expression, child: Expression)` * `case class TryDivide(left: Expression, right: Expression, child: Expression)` * `new RuntimeException(s\"Failed to convert value $value (class of $cls) \" +` * `trait FileWrite extends Write ` * `case class CSVWrite(` * `case class JsonWrite(` * `case class OrcWrite(` * `case class ParquetWrite(` * `case class TextWrite(` * `trait ShuffledJoin extends JoinCodegenSupport ` * `class ConsoleWrite(schema: StructType, options: CaseInsensitiveStringMap)` * `class ForeachWrite[T](` * `class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: Boolean) extends Write ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable
SparkQA removed a comment on pull request #32539: URL: https://github.com/apache/spark/pull/32539#issuecomment-840964805 **[Test build #138536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138536/testReport)** for PR 32539 at commit [`37a168a`](https://github.com/apache/spark/commit/37a168a32d1cd5194820d7b6e5ad2c03fb9427ec). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable
SparkQA commented on pull request #32539: URL: https://github.com/apache/spark/pull/32539#issuecomment-841015356 **[Test build #138536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138536/testReport)** for PR 32539 at commit [`37a168a`](https://github.com/apache/spark/commit/37a168a32d1cd5194820d7b6e5ad2c03fb9427ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
cloud-fan commented on pull request #32315: URL: https://github.com/apache/spark/pull/32315#issuecomment-841014965 looks good, can you update the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
cloud-fan commented on a change in pull request #32315: URL: https://github.com/apache/spark/pull/32315#discussion_r632292445 ## File path: core/src/test/scala/org/apache/spark/SparkFunSuite.scala ## @@ -19,11 +19,13 @@ package org.apache.spark // scalastyle:off import java.io.File +import java.nio.file.Path import java.util.{Locale, TimeZone} import org.apache.log4j.spi.LoggingEvent Review comment: seems the import order is wrong. This should appear after `scala.annotation.tailrec` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
SparkQA commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-841014677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
SparkQA commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-841014310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
SparkQA commented on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-841013832 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43062/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r632289902 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ## @@ -96,4 +99,20 @@ object DataWritingCommand { sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY), metrics.values.toSeq) } + + def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: Configuration) { +if (saveMode != SaveMode.Overwrite && + !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) { + val filePath = new org.apache.hadoop.fs.Path(tablePath) + val fs = filePath.getFileSystem(hadoopConf) + if (fs.exists(filePath) && +fs.getFileStatus(filePath).isDirectory && +fs.listStatus(filePath).length != 0) { +throw new AnalysisException( + s"CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory " + +s"${tablePath} . To disable the behavior, " + +s"set 'spark.sql.legacy.allowNonEmptyLocationInCTAS to true.") Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r632289297 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } + test("SPARK-28551: CTAS Hive Table should be with non-existent or empty location") { +def executeCTASWithNonEmptyLocation(tempLocation: String) { + sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION '$tempLocation/ctas1'") + sql("INSERT INTO TABLE ctas1 SELECT 'A' ") + sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile LOCATION + |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, value""".stripMargin) +} + +Seq("false", "true").foreach { convertCTASFlag => + Seq("false", "true").foreach { allowNonEmptyDirFlag => +withSQLConf( + SQLConf.CONVERT_CTAS.key -> convertCTASFlag, + SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS.key -> allowNonEmptyDirFlag) { + withTempDir { dir => +val tempLocation = dir.toURI.toString +withTable("ctas1", "ctas_with_existing_location") { + if (!spark.conf.get(SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS)) { Review comment: Done, changed type of val `allowNonEmptyDirFlag` to Boolean -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r632288726 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } + test("SPARK-28551: CTAS Hive Table should be with non-existent or empty location") { +def executeCTASWithNonEmptyLocation(tempLocation: String) { + sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION '$tempLocation/ctas1'") + sql("INSERT INTO TABLE ctas1 SELECT 'A' ") + sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile LOCATION + |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, value""".stripMargin) +} + +Seq("false", "true").foreach { convertCTASFlag => Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r632288384 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ## @@ -96,4 +99,20 @@ object DataWritingCommand { sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY), metrics.values.toSeq) } + + def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: Configuration) { +if (saveMode != SaveMode.Overwrite && + !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) { + val filePath = new org.apache.hadoop.fs.Path(tablePath) + val fs = filePath.getFileSystem(hadoopConf) + if (fs.exists(filePath) && +fs.getFileStatus(filePath).isDirectory && Review comment: Done ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ## @@ -96,4 +99,20 @@ object DataWritingCommand { sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY), metrics.values.toSeq) } + + def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: Configuration) { +if (saveMode != SaveMode.Overwrite && + !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) { Review comment: Done ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala ## @@ -96,4 +99,20 @@ object DataWritingCommand { sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY), metrics.values.toSeq) } + + def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: Configuration) { +if (saveMode != SaveMode.Overwrite && + !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) { + val filePath = new org.apache.hadoop.fs.Path(tablePath) + val fs = filePath.getFileSystem(hadoopConf) + if (fs.exists(filePath) && +fs.getFileStatus(filePath).isDirectory && +fs.listStatus(filePath).length != 0) { +throw new AnalysisException( + s"CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory " + +s"${tablePath} . To disable the behavior, " + Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r632288269 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } + test("SPARK-28551: CTAS Hive Table should be with non-existent or empty location") { +def executeCTASWithNonEmptyLocation(tempLocation: String) { + sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION '$tempLocation/ctas1'") + sql("INSERT INTO TABLE ctas1 SELECT 'A' ") + sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile LOCATION + |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, value""".stripMargin) +} + +Seq("false", "true").foreach { convertCTASFlag => + Seq("false", "true").foreach { allowNonEmptyDirFlag => +withSQLConf( + SQLConf.CONVERT_CTAS.key -> convertCTASFlag, + SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS.key -> allowNonEmptyDirFlag) { + withTempDir { dir => +val tempLocation = dir.toURI.toString +withTable("ctas1", "ctas_with_existing_location") { + if (!spark.conf.get(SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS)) { +intercept[AnalysisException] { Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
SparkQA commented on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-841011518 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43062/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
SparkQA removed a comment on pull request #32315: URL: https://github.com/apache/spark/pull/32315#issuecomment-840964887 **[Test build #138537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138537/testReport)** for PR 32315 at commit [`6e70fdd`](https://github.com/apache/spark/commit/6e70fddc6928d6aa60ef6a66eccc7f95884ce752). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite
SparkQA commented on pull request #32315: URL: https://github.com/apache/spark/pull/32315#issuecomment-841008847 **[Test build #138537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138537/testReport)** for PR 32315 at commit [`6e70fdd`](https://github.com/apache/spark/commit/6e70fddc6928d6aa60ef6a66eccc7f95884ce752). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on a change in pull request #32495: [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type
c21 commented on a change in pull request #32495: URL: https://github.com/apache/spark/pull/32495#discussion_r632283178 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala ## @@ -418,115 +428,106 @@ case class SortMergeJoinExec( // Inline mutable state since not many join operations in a task val matches = ctx.addMutableState(clsName, "matches", v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", forceInline = true) -// Copy the left keys as class members so they could be used in next function call. -val matchedKeyVars = copyKeys(ctx, leftKeyVars) +// Copy the streamed keys as class members so they could be used in next function call. +val matchedKeyVars = copyKeys(ctx, streamedKeyVars) -ctx.addNewFunction("findNextInnerJoinRows", +ctx.addNewFunction("findNextJoinRows", s""" - |private boolean findNextInnerJoinRows( - |scala.collection.Iterator leftIter, - |scala.collection.Iterator rightIter) { - | $leftRow = null; + |private boolean findNextJoinRows( + |scala.collection.Iterator streamedIter, + |scala.collection.Iterator bufferedIter) { + | $streamedRow = null; | int comp = 0; - | while ($leftRow == null) { - |if (!leftIter.hasNext()) return false; - |$leftRow = (InternalRow) leftIter.next(); - |${leftKeyVars.map(_.code).mkString("\n")} - |if ($leftAnyNull) { - | $leftRow = null; + | while ($streamedRow == null) { + |if (!streamedIter.hasNext()) return false; + |$streamedRow = (InternalRow) streamedIter.next(); + |${streamedKeyVars.map(_.code).mkString("\n")} + |if ($streamedAnyNull) { + | $streamedRow = null; | continue; |} |if (!$matches.isEmpty()) { - | ${genComparison(ctx, leftKeyVars, matchedKeyVars)} + | ${genComparison(ctx, streamedKeyVars, matchedKeyVars)} | if (comp == 0) { |return true; | } | $matches.clear(); |} | |do { - | if ($rightRow == null) { - |if (!rightIter.hasNext()) { + | if ($bufferedRow == null) { + |if (!bufferedIter.hasNext()) { | ${matchedKeyVars.map(_.code).mkString("\n")} | return !$matches.isEmpty(); |} - |$rightRow = (InternalRow) rightIter.next(); - |${rightKeyTmpVars.map(_.code).mkString("\n")} - |if ($rightAnyNull) { - | $rightRow = null; + |$bufferedRow = (InternalRow) bufferedIter.next(); + |${bufferedKeyTmpVars.map(_.code).mkString("\n")} + |if ($bufferedAnyNull) { + | $bufferedRow = null; | continue; |} - |${rightKeyVars.map(_.code).mkString("\n")} + |${bufferedKeyVars.map(_.code).mkString("\n")} | } - | ${genComparison(ctx, leftKeyVars, rightKeyVars)} + | ${genComparison(ctx, streamedKeyVars, bufferedKeyVars)} | if (comp > 0) { - |$rightRow = null; + |$bufferedRow = null; | } else if (comp < 0) { |if (!$matches.isEmpty()) { | ${matchedKeyVars.map(_.code).mkString("\n")} | return true; |} - |$leftRow = null; + |$streamedRow = null; | } else { - |$matches.add((UnsafeRow) $rightRow); - |$rightRow = null; + |$matches.add((UnsafeRow) $bufferedRow); + |$bufferedRow = null; | } - |} while ($leftRow != null); + |} while ($streamedRow != null); | } | return false; // unreachable |} """.stripMargin, inlineToOuterClass = true) Review comment: @cloud-fan - thanks for calling it out. I would like to make it future proof by giving it a fresh name. Let me do it in a followup PR. Actually I was wondering the same question as you when implementing and spent some time to figuring it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 commented on pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
c21 commented on pull request #32547: URL: https://github.com/apache/spark/pull/32547#issuecomment-841007190 cc @cloud-fan and @maropu to take a look when you have time, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] c21 opened a new pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join
c21 opened a new pull request #32547: URL: https://github.com/apache/spark/pull/32547 ### What changes were proposed in this pull request? As title. This PR is to add code-gen support for LEFT ANTI sort merge join. The main change is to extract `loadStreamed` in `SortMergeJoinExec.doProduce()`. That is to set all columns values for streamed row, when the streamed row has no output row. Example query: ``` val df1 = spark.range(10).select($"id".as("k1")) val df2 = spark.range(4).select($"id".as("k2")) df1.join(df2.hint("SHUFFLE_MERGE"), $"k1" === $"k2", "left_anti") ``` Example generated code: ``` == Subtree 5 / 5 (maxMethodCodeSize:296; maxConstantPoolSize:156(0.24% used); numInnerClasses:0) == *(5) Project [id#0L AS k1#2L] +- *(5) SortMergeJoin [id#0L], [k2#6L], LeftAnti :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(id#0L, 5), ENSURE_REQUIREMENTS, [id=#27] : +- *(1) Range (0, 10, step=1, splits=2) +- *(4) Sort [k2#6L ASC NULLS FIRST], false, 0 +- Exchange hashpartitioning(k2#6L, 5), ENSURE_REQUIREMENTS, [id=#33] +- *(3) Project [id#4L AS k2#6L] +- *(3) Range (0, 4, step=1, splits=2) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage5(references); /* 003 */ } /* 004 */ /* 005 */ // codegenStageId=5 /* 006 */ final class GeneratedIteratorForCodegenStage5 extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ private Object[] references; /* 008 */ private scala.collection.Iterator[] inputs; /* 009 */ private scala.collection.Iterator smj_streamedInput_0; /* 010 */ private scala.collection.Iterator smj_bufferedInput_0; /* 011 */ private InternalRow smj_streamedRow_0; /* 012 */ private InternalRow smj_bufferedRow_0; /* 013 */ private long smj_value_2; /* 014 */ private org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray smj_matches_0; /* 015 */ private long smj_value_3; /* 016 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] smj_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[2]; /* 017 */ /* 018 */ public GeneratedIteratorForCodegenStage5(Object[] references) { /* 019 */ this.references = references; /* 020 */ } /* 021 */ /* 022 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 023 */ partitionIndex = index; /* 024 */ this.inputs = inputs; /* 025 */ smj_streamedInput_0 = inputs[0]; /* 026 */ smj_bufferedInput_0 = inputs[1]; /* 027 */ /* 028 */ smj_matches_0 = new org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray(1, 2147483647); /* 029 */ smj_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 030 */ smj_mutableStateArray_0[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 031 */ /* 032 */ } /* 033 */ /* 034 */ private boolean findNextJoinRows( /* 035 */ scala.collection.Iterator streamedIter, /* 036 */ scala.collection.Iterator bufferedIter) { /* 037 */ smj_streamedRow_0 = null; /* 038 */ int comp = 0; /* 039 */ while (smj_streamedRow_0 == null) { /* 040 */ if (!streamedIter.hasNext()) return false; /* 041 */ smj_streamedRow_0 = (InternalRow) streamedIter.next(); /* 042 */ long smj_value_0 = smj_streamedRow_0.getLong(0); /* 043 */ if (false) { /* 044 */ if (!smj_matches_0.isEmpty()) { /* 045 */ smj_matches_0.clear(); /* 046 */ } /* 047 */ return false; /* 048 */ /* 049 */ } /* 050 */ if (!smj_matches_0.isEmpty()) { /* 051 */ comp = 0; /* 052 */ if (comp == 0) { /* 053 */ comp = (smj_value_0 > smj_value_3 ? 1 : smj_value_0 < smj_value_3 ? -1 : 0); /* 054 */ } /* 055 */ /* 056 */ if (comp == 0) { /* 057 */ return true; /* 058 */ } /* 059 */ smj_matches_0.clear(); /* 060 */ } /* 061 */ /* 062 */ do { /* 063 */ if (smj_bufferedRow_0 == null) { /* 064 */ if (!bufferedIter.hasNext()) { /* 065 */ smj_value_3 = smj_value_0; /* 066 */ return !smj_matches_0.isEmpty(); /* 067 */ } /* 068 */ smj_bufferedRow_0 = (InternalRow) bufferedIter.next(); /* 069 */ long smj_value_1 = smj_bufferedRow_0.getLong(0); /* 070 */ if (false) { /* 071 */ smj_bufferedRow_0 = null; /* 072 */ continue; /* 073 */ } /* 074 */ smj_value_2 = smj_value_1; /* 075 */ } /* 076 */ /*
[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
beliefer commented on a change in pull request #32513: URL: https://github.com/apache/spark/pull/32513#discussion_r632281901 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -71,7 +72,12 @@ class QueryExecution( lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) { // We can't clone `logical` here, which will reset the `_analyzed` flag. -sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) +sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) transform { + // SPARK-35378: Eagerly execute LeafRunnableCommand so that query command with CTE + case r: LeafRunnableCommand => Review comment: OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic opened a new pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page
itholic opened a new pull request #32546: URL: https://github.com/apache/spark/pull/32546 ### What changes were proposed in this pull request? This PR proposes move ORC data source options from Python, Scala and Java into a single page. ### Why are the changes needed? So far, the documentation for ORC data source options is separated into different pages for each language API documents. However, this makes managing many options inconvenient, so it is efficient to manage all options in a single page and provide a link to that page in the API of each language. ### Does this PR introduce _any_ user-facing change? The "Data Source Option" page is added to the "ORC Files" page, and linked to the each API docs. https://user-images.githubusercontent.com/44108233/118223516-6ec22280-b4bc-11eb-8281-847546452863.png;> ### How was this patch tested? Manually build docs and confirm the page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32501: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation
cloud-fan commented on a change in pull request #32501: URL: https://github.com/apache/spark/pull/32501#discussion_r632281269 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala ## @@ -72,12 +73,20 @@ private[sql] object PartitioningUtils { private def charTypeWriteSideCheck(inputStr: String, limit: Int): String = { val toUtf8 = UTF8String.fromString(inputStr) -CharVarcharCodegenUtils.charTypeWriteSideCheck(toUtf8, limit).toString +if (SQLConf.get.charVarcharAsString) { Review comment: can we move the check to `normalizePartitionSpec`? ``` val normalizedVal = if (SQLConf.get.charVarcharAsString) value else normalizedFiled.dataType match { ... } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
cloud-fan commented on a change in pull request #32513: URL: https://github.com/apache/spark/pull/32513#discussion_r632278890 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -71,7 +72,12 @@ class QueryExecution( lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) { // We can't clone `logical` here, which will reset the `_analyzed` flag. -sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) +sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) transform { + // SPARK-35378: Eagerly execute LeafRunnableCommand so that query command with CTE + case r: LeafRunnableCommand => Review comment: we should run all `Command`s, not just `LeafRunnableCommand` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
SparkQA removed a comment on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-840916339 **[Test build #138530 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138530/testReport)** for PR 32161 at commit [`c50df9a`](https://github.com/apache/spark/commit/c50df9afe801a1346d3d581cdcb8d35bdf5c643c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
SparkQA commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-841003436 **[Test build #138530 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138530/testReport)** for PR 32161 at commit [`c50df9a`](https://github.com/apache/spark/commit/c50df9afe801a1346d3d581cdcb8d35bdf5c643c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class AgeExample(birthday: Expression, child: Expression) extends RuntimeReplaceable ` * `class SessionExtensionsWithLoader extends SparkSessionExtensionsProvider ` * `class SessionExtensionsWithoutLoader extends SparkSessionExtensionsProvider ` * `case class AvroWrite(` * `case class KafkaWrite(` * `case class TryEval(child: Expression) extends UnaryExpression with NullIntolerant ` * `case class TryAdd(left: Expression, right: Expression, child: Expression)` * `case class TryDivide(left: Expression, right: Expression, child: Expression)` * `new RuntimeException(s\"Failed to convert value $value (class of $cls) \" +` * `trait FileWrite extends Write ` * `case class CSVWrite(` * `case class JsonWrite(` * `case class OrcWrite(` * `case class ParquetWrite(` * `case class TextWrite(` * `trait ShuffledJoin extends JoinCodegenSupport ` * `class ConsoleWrite(schema: StructType, options: CaseInsensitiveStringMap)` * `class ForeachWrite[T](` * `class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: Boolean) extends Write ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
SparkQA commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-841002776 **[Test build #138545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138545/testReport)** for PR 32161 at commit [`883aacf`](https://github.com/apache/spark/commit/883aacf94044c1b32648d6dab0ed6fc10048b580). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #29642: [SPARK-32792][SQL] Improve Parquet In filter pushdown
cloud-fan commented on pull request #29642: URL: https://github.com/apache/spark/pull/29642#issuecomment-841002294 Yea these benchmark results are not updated in time. Let's post the benchmark result before and after this PR in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)
cloud-fan edited a comment on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
AmplabJenkins removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841001172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138543/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
SparkQA removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-840998962 **[Test build #138543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)** for PR 32513 at commit [`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)
cloud-fan closed pull request #32496: URL: https://github.com/apache/spark/pull/32496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan edited a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)
cloud-fan edited a comment on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194 thanks, merging to master/3.1! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)
cloud-fan commented on pull request #32496: URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
AmplabJenkins commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841001172 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138543/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841001158 **[Test build #138543 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)** for PR 32513 at commit [`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
AmplabJenkins removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841000797 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43064/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841000788 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43064/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
AmplabJenkins commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-841000797 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43064/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst
SparkQA commented on pull request #32478: URL: https://github.com/apache/spark/pull/32478#issuecomment-840999011 **[Test build #138544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138544/testReport)** for PR 32478 at commit [`2214608`](https://github.com/apache/spark/commit/221460890e99371fc61443b0fdc7bcd108166c5a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-840998962 **[Test build #138543 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)** for PR 32513 at commit [`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
AmplabJenkins removed a comment on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-840964019 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
SparkQA commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-840998927 **[Test build #138542 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138542/testReport)** for PR 32542 at commit [`0da0219`](https://github.com/apache/spark/commit/0da02192657490c2fd06e489731e02a58562ec08). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
SparkQA commented on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-840998950 **[Test build #138541 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138541/testReport)** for PR 32543 at commit [`b5bf370`](https://github.com/apache/spark/commit/b5bf37016d169675048091e4794de70c3f82a15a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
SparkQA commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-840998906 **[Test build #138540 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)** for PR 32544 at commit [`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
AmplabJenkins removed a comment on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-840998180 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter
AmplabJenkins removed a comment on pull request #32541: URL: https://github.com/apache/spark/pull/32541#issuecomment-840998179 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32545: [MINOR][DOC] ADD toc for monitoring page
AmplabJenkins removed a comment on pull request #32545: URL: https://github.com/apache/spark/pull/32545#issuecomment-840998177 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43059/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter
AmplabJenkins commented on pull request #32541: URL: https://github.com/apache/spark/pull/32541#issuecomment-840998179 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138534/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32545: [MINOR][DOC] ADD toc for monitoring page
AmplabJenkins commented on pull request #32545: URL: https://github.com/apache/spark/pull/32545#issuecomment-840998177 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43059/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier
AmplabJenkins commented on pull request #32542: URL: https://github.com/apache/spark/pull/32542#issuecomment-840998180 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138533/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter
SparkQA removed a comment on pull request #32541: URL: https://github.com/apache/spark/pull/32541#issuecomment-840934589 **[Test build #138534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138534/testReport)** for PR 32541 at commit [`69a08ac`](https://github.com/apache/spark/commit/69a08ac03e68ed26b19faa4ba2d34fb3d8a486d9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter
SparkQA commented on pull request #32541: URL: https://github.com/apache/spark/pull/32541#issuecomment-840996613 **[Test build #138534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138534/testReport)** for PR 32541 at commit [`69a08ac`](https://github.com/apache/spark/commit/69a08ac03e68ed26b19faa4ba2d34fb3d8a486d9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32545: [MINOR][DOC] ADD toc for monitoring page
SparkQA commented on pull request #32545: URL: https://github.com/apache/spark/pull/32545#issuecomment-840995308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable
yaooqinn commented on a change in pull request #32539: URL: https://github.com/apache/spark/pull/32539#discussion_r632269142 ## File path: core/src/main/scala/org/apache/spark/internal/config/UI.scala ## @@ -218,4 +218,13 @@ private[spark] object UI { .stringConf .transform(_.toUpperCase(Locale.ROOT)) .createWithDefault("LOCAL") + + val UI_THREADS = ConfigBuilder("spark.ui.threads") Review comment: On the other hand, the default 200 might be easy to hit ![image](https://user-images.githubusercontent.com/8326978/118219985-294a2900-b4ad-11eb-876b-0928baa8ea58.png) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Convert LeafRunnableCommand to LocalRelation when query with CTE
beliefer commented on a change in pull request #32513: URL: https://github.com/apache/spark/pull/32513#discussion_r632268442 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -71,7 +72,15 @@ class QueryExecution( lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) { // We can't clone `logical` here, which will reset the `_analyzed` flag. -sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) +sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) match { + case c: Command => c Review comment: OK. Let's unify the behavior eagerly execute the commands. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter
HyukjinKwon commented on pull request #32541: URL: https://github.com/apache/spark/pull/32541#issuecomment-840990663 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Convert LeafRunnableCommand to LocalRelation when query with CTE
beliefer commented on a change in pull request #32513: URL: https://github.com/apache/spark/pull/32513#discussion_r632267184 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -71,7 +72,15 @@ class QueryExecution( lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) { // We can't clone `logical` here, which will reset the `_analyzed` flag. -sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) +sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) match { + case c: Command => c Review comment: I want to maintain the Command here and ensure its behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LittleCuteBug commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
LittleCuteBug commented on a change in pull request #32544: URL: https://github.com/apache/spark/pull/32544#discussion_r632266784 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -138,11 +138,12 @@ case class BroadcastExchangeExec( s"type: ${relation.getClass.getName}") } -longMetric("dataSize") += dataSize -if (dataSize >= MAX_BROADCAST_TABLE_BYTES) { - throw new SparkException( -s"Cannot broadcast the table that is larger than 8GB: ${dataSize >> 30} GB") -} +longMetric("dataSize") += dataSize Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me
LuciferYang commented on a change in pull request #32536: URL: https://github.com/apache/spark/pull/32536#discussion_r632263291 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging { */ private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): ByteCodeStats = { // First retrieve the generated classes. -val classes = { - val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc") - scField.setAccessible(true) - val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler] - val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader] - val classesField = loader.getClass.getDeclaredField("classes") - classesField.setAccessible(true) - classesField.get(loader).asInstanceOf[JavaMap[String, Array[Byte]]].asScala -} +val classes = evaluator.getBytecodes.asScala Review comment: @maropu If we want to do some code checking, maybe we can enhanced the case `metrics are recorded on compile` in `CodeGenerationSuite` as follows: ``` test("metrics are recorded on compile") { ... val metricGeneratedClassByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues val metricGeneratedMethodByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil) ... // Make sure the stat content doesn't change before/after SPARK-35398 assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedClassByteCodeSizeSnapshot) .sameElements(Array(740, 1293))) assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedMethodByteCodeSizeSnapshot) .sameElements(Array(5, 5, 6, 7, 10, 15, 46))) } ``` The new assertion can be passed before and after this pr, however, if we update the version of janino or change the codegen of `Add`, we may need to update the content of the assertion because the size of the generated code may change. For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with janino 3.1.4 are `Array(740, 1293)`, but with janino 3.0.16 are `Array(687, 1036)`, so I'm not sure if we need to add these assertions in this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me
LuciferYang commented on a change in pull request #32536: URL: https://github.com/apache/spark/pull/32536#discussion_r632263291 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging { */ private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): ByteCodeStats = { // First retrieve the generated classes. -val classes = { - val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc") - scField.setAccessible(true) - val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler] - val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader] - val classesField = loader.getClass.getDeclaredField("classes") - classesField.setAccessible(true) - classesField.get(loader).asInstanceOf[JavaMap[String, Array[Byte]]].asScala -} +val classes = evaluator.getBytecodes.asScala Review comment: @maropu If we want to do some code checking, maybe we can enhanced the case `metrics are recorded on compile` in `CodeGenerationSuite` as follows: ``` test("metrics are recorded on compile") { ... val metricGeneratedClassByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues val metricGeneratedMethodByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil) ... // Make sure the stat content doesn't change before/after SPARK-35398 assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedClassByteCodeSizeSnapshot) .sameElements(Array(740, 1293))) assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedMethodByteCodeSizeSnapshot) .sameElements(Array(5, 5, 6, 7, 10, 15, 46))) } ``` The new assertion can be passed before and after this pr, however, if we update the version of janino or change the codegen of `Add`, we may need to update the content of the assertion because the size of the generated code may change. For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with janino 3.1.4 are `Array(740, 1293)`, but withe janino 3.0.16 are `Array(687, 1036)`, so I'm not sure if we need to add these assertions in this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me
LuciferYang commented on a change in pull request #32536: URL: https://github.com/apache/spark/pull/32536#discussion_r632263291 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging { */ private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): ByteCodeStats = { // First retrieve the generated classes. -val classes = { - val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc") - scField.setAccessible(true) - val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler] - val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader] - val classesField = loader.getClass.getDeclaredField("classes") - classesField.setAccessible(true) - classesField.get(loader).asInstanceOf[JavaMap[String, Array[Byte]]].asScala -} +val classes = evaluator.getBytecodes.asScala Review comment: @maropu If we want to do some code checking, maybe we can can be enhanced the case `metrics are recorded on compile` in `CodeGenerationSuite` as follows: ``` test("metrics are recorded on compile") { ... val metricGeneratedClassByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues val metricGeneratedMethodByteCodeSizeSnapshot = CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil) ... // Make sure the stat content doesn't change before/after SPARK-35398 assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedClassByteCodeSizeSnapshot) .sameElements(Array(740, 1293))) assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues .diff(metricGeneratedMethodByteCodeSizeSnapshot) .sameElements(Array(5, 5, 6, 7, 10, 15, 46))) } ``` The new assertion can be passed before and after this pr, however, if we update the version of janino or change the codegen of `Add`, we may need to update the content of the assertion because the size of the generated code may change. For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with janino 3.1.4 are `Array(740, 1293)`, but withe janino 3.0.16 are `Array(687, 1036)`, so I'm not sure if we need to add these assertions in this pr. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment
srowen commented on pull request #32543: URL: https://github.com/apache/spark/pull/32543#issuecomment-840986837 Jenkins retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
srowen commented on a change in pull request #32544: URL: https://github.com/apache/spark/pull/32544#discussion_r632261618 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala ## @@ -138,11 +138,12 @@ case class BroadcastExchangeExec( s"type: ${relation.getClass.getName}") } -longMetric("dataSize") += dataSize -if (dataSize >= MAX_BROADCAST_TABLE_BYTES) { - throw new SparkException( -s"Cannot broadcast the table that is larger than 8GB: ${dataSize >> 30} GB") -} +longMetric("dataSize") += dataSize Review comment: Fix the indentation though -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sumeetgajjar commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight
sumeetgajjar commented on pull request #32114: URL: https://github.com/apache/spark/pull/32114#issuecomment-840986610 Hi @mridulm @Ngone51 @attilapiros , I have updated the PR with the solution that Attila proposed. Could you please review it once you are free? Apologies for the delay, I was fighting fires of my own. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala
srowen commented on pull request #32544: URL: https://github.com/apache/spark/pull/32544#issuecomment-840986580 Jenkins test this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org