[GitHub] [spark] imback82 commented on pull request #32542: [SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


imback82 commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-841023379


   @cloud-fan this is a PR to attempt to migrate the remaining `ALTER TABLE` 
commands to the new resolution framework. Previous attempt was reverted due to 
few concerns (https://github.com/apache/spark/pull/27327). Please let me know 
what you think about the new changes. Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841022760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841022760


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138540/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-840998906


   **[Test build #138540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)**
 for PR 32544 at commit 
[`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-841022160


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


SparkQA commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841022352


   **[Test build #138540 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)**
 for PR 32544 at commit 
[`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-841022160


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43063/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


SparkQA commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-841022133






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32547:
URL: https://github.com/apache/spark/pull/32547#discussion_r632297612



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -693,15 +701,12 @@ case class SortMergeJoinExec(
|continue;
|  }
|}
-   |if (!$loaded) {
-   |  $loaded = true;
-   |  $streamedAfter

Review comment:
   @cloud-fan - good call for code size. Actually I just figured we don't 
need `streamedAfter` here for left anti, because we can skip streamed row when 
there's a match here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32547:
URL: https://github.com/apache/spark/pull/32547#discussion_r632297612



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -693,15 +701,12 @@ case class SortMergeJoinExec(
|continue;
|  }
|}
-   |if (!$loaded) {
-   |  $loaded = true;
-   |  $streamedAfter

Review comment:
   @cloud-fan - good call for code size. Actually I just figured we don't 
need `$loadStreamed` here for left anti, because we can skip streamed row when 
there's a match here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32544:
URL: https://github.com/apache/spark/pull/32544#discussion_r632296521



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
##
@@ -141,7 +141,8 @@ case class BroadcastExchangeExec(
 longMetric("dataSize") += dataSize
 if (dataSize >= MAX_BROADCAST_TABLE_BYTES) {
   throw new SparkException(
-s"Cannot broadcast the table that is larger than 8GB: 
${dataSize >> 30} GB")
+s"Cannot broadcast the table that is larger than" +
+  s" ${MAX_BROADCAST_TABLE_BYTES >> 30}GB: ${dataSize >> 30} 
GB")

Review comment:
   sorry why `8GB` is not accurate? 
`BroadcastExchangeExec.MAX_BROADCAST_TABLE_BYTES` seems to be set statically as 
`8L << 30`, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32547:
URL: https://github.com/apache/spark/pull/32547#discussion_r632296238



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -693,15 +701,12 @@ case class SortMergeJoinExec(
|continue;
|  }
|}
-   |if (!$loaded) {
-   |  $loaded = true;
-   |  $streamedAfter

Review comment:
   For left anti, `streamedAfter` will appear twice in the code. How large 
is it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-841017158


   **[Test build #138551 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138551/testReport)**
 for PR 32204 at commit 
[`8b48e6f`](https://github.com/apache/spark/commit/8b48e6fee602ce1df5345c677b81bf731b8a05ee).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


SparkQA commented on pull request #32411:
URL: https://github.com/apache/spark/pull/32411#issuecomment-841017050


   **[Test build #138550 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138550/testReport)**
 for PR 32411 at commit 
[`67d5a78`](https://github.com/apache/spark/commit/67d5a78b0c8b51484ce607896797d0d6129b7ec6).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


SparkQA commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-841016958


   **[Test build #138549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138549/testReport)**
 for PR 32542 at commit 
[`9e058b1`](https://github.com/apache/spark/commit/9e058b10a3e61cfe702cab8a4b15726660e3da78).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


SparkQA commented on pull request #32547:
URL: https://github.com/apache/spark/pull/32547#issuecomment-841016888


   **[Test build #138547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138547/testReport)**
 for PR 32547 at commit 
[`53364c9`](https://github.com/apache/spark/commit/53364c9076a922df8f5ff26e7a62e0bc871401e9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-841016640


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138529/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA commented on pull request #32546:
URL: https://github.com/apache/spark/pull/32546#issuecomment-841016889


   **[Test build #138548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138548/testReport)**
 for PR 32546 at commit 
[`0ce01d0`](https://github.com/apache/spark/commit/0ce01d037a01e45b4b02ba3dbd68dfea940ecaee).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it

2021-05-13 Thread GitBox


SparkQA commented on pull request #32548:
URL: https://github.com/apache/spark/pull/32548#issuecomment-841016885


   **[Test build #138546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138546/testReport)**
 for PR 32548 at commit 
[`5a718fc`](https://github.com/apache/spark/commit/5a718fc08d828421b62ead9cfbe1b099ca3c76fd).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-841016640


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138529/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it

2021-05-13 Thread GitBox


c21 commented on pull request #32548:
URL: https://github.com/apache/spark/pull/32548#issuecomment-841016613


   cc @cloud-fan, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #32548: [SPARK-35363][SQL][FOLLOWUP] Use fresh name for findNextJoinRows instead of hardcoding it

2021-05-13 Thread GitBox


c21 opened a new pull request #32548:
URL: https://github.com/apache/spark/pull/32548


   
   
   ### What changes were proposed in this pull request?
   
   This is a followup from discussion in 
https://github.com/apache/spark/pull/32495#discussion_r632283178 . The 
hardcoded function name `findNextJoinRows` is not a real problem now as we 
always do code generation for SMJ's children separately. But this change is to 
make it future proof in case this assumption changed in the future.
   
   ### Why are the changes needed?
   
   Fix the potential reliability issue.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing unit tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32539:
URL: https://github.com/apache/spark/pull/32539#issuecomment-841016254


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32539:
URL: https://github.com/apache/spark/pull/32539#issuecomment-841016254


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138536/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32315:
URL: https://github.com/apache/spark/pull/32315#issuecomment-841015747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-841015745


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43065/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841015746


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43061/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-841015748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-841015743


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43062/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841015746


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43061/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-841015748


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138530/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32315:
URL: https://github.com/apache/spark/pull/32315#issuecomment-841015747


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138537/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-841015745


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43065/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-840916300


   **[Test build #138529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138529/testReport)**
 for PR 32204 at commit 
[`2b6b066`](https://github.com/apache/spark/commit/2b6b066de16a3820cb89de21f31ffbe1b08e66e8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-841015743


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43062/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


SparkQA commented on pull request #32204:
URL: https://github.com/apache/spark/pull/32204#issuecomment-841015636


   **[Test build #138529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138529/testReport)**
 for PR 32204 at commit 
[`2b6b066`](https://github.com/apache/spark/commit/2b6b066de16a3820cb89de21f31ffbe1b08e66e8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AgeExample(birthday: Expression, child: Expression) extends 
RuntimeReplaceable `
 * `class SessionExtensionsWithLoader extends 
SparkSessionExtensionsProvider `
 * `class SessionExtensionsWithoutLoader extends 
SparkSessionExtensionsProvider `
 * `case class AvroWrite(`
 * `case class KafkaWrite(`
 * `case class TryEval(child: Expression) extends UnaryExpression with 
NullIntolerant `
 * `case class TryAdd(left: Expression, right: Expression, child: 
Expression)`
 * `case class TryDivide(left: Expression, right: Expression, child: 
Expression)`
 * `new RuntimeException(s\"Failed to convert value $value (class of 
$cls) \" +`
 * `trait FileWrite extends Write `
 * `case class CSVWrite(`
 * `case class JsonWrite(`
 * `case class OrcWrite(`
 * `case class ParquetWrite(`
 * `case class TextWrite(`
 * `trait ShuffledJoin extends JoinCodegenSupport `
 * `class ConsoleWrite(schema: StructType, options: 
CaseInsensitiveStringMap)`
 * `class ForeachWrite[T](`
 * `class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: 
Boolean) extends Write `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32539:
URL: https://github.com/apache/spark/pull/32539#issuecomment-840964805


   **[Test build #138536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138536/testReport)**
 for PR 32539 at commit 
[`37a168a`](https://github.com/apache/spark/commit/37a168a32d1cd5194820d7b6e5ad2c03fb9427ec).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable

2021-05-13 Thread GitBox


SparkQA commented on pull request #32539:
URL: https://github.com/apache/spark/pull/32539#issuecomment-841015356


   **[Test build #138536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138536/testReport)**
 for PR 32539 at commit 
[`37a168a`](https://github.com/apache/spark/commit/37a168a32d1cd5194820d7b6e5ad2c03fb9427ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


cloud-fan commented on pull request #32315:
URL: https://github.com/apache/spark/pull/32315#issuecomment-841014965


   looks good, can you update the PR description?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32315:
URL: https://github.com/apache/spark/pull/32315#discussion_r632292445



##
File path: core/src/test/scala/org/apache/spark/SparkFunSuite.scala
##
@@ -19,11 +19,13 @@ package org.apache.spark
 
 // scalastyle:off
 import java.io.File
+import java.nio.file.Path
 import java.util.{Locale, TimeZone}
 
 import org.apache.log4j.spi.LoggingEvent

Review comment:
   seems the import order is wrong. This should appear after 
`scala.annotation.tailrec`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


SparkQA commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-841014677






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-841014310






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


SparkQA commented on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-841013832


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43062/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r632289902



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##
@@ -96,4 +99,20 @@ object DataWritingCommand {
   sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
   metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: 
Configuration) {
+if (saveMode != SaveMode.Overwrite &&
+  !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) {
+  val filePath = new org.apache.hadoop.fs.Path(tablePath)
+  val fs = filePath.getFileSystem(hadoopConf)
+  if (fs.exists(filePath) &&
+fs.getFileStatus(filePath).isDirectory &&
+fs.listStatus(filePath).length != 0) {
+throw new AnalysisException(
+  s"CREATE-TABLE-AS-SELECT cannot create table with location to a 
non-empty directory " +
+s"${tablePath} . To disable the behavior, " +
+s"set 'spark.sql.legacy.allowNonEmptyLocationInCTAS to true.")

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r632289297



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##
@@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with 
SQLTestUtils with TestHi
 }
   }
 
+  test("SPARK-28551: CTAS Hive Table should be with non-existent or empty 
location") {
+def executeCTASWithNonEmptyLocation(tempLocation: String) {
+  sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION 
'$tempLocation/ctas1'")
+  sql("INSERT INTO TABLE ctas1 SELECT 'A' ")
+  sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile 
LOCATION
+   |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, 
value""".stripMargin)
+}
+
+Seq("false", "true").foreach { convertCTASFlag =>
+  Seq("false", "true").foreach { allowNonEmptyDirFlag =>
+withSQLConf(
+  SQLConf.CONVERT_CTAS.key -> convertCTASFlag,
+  SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS.key -> 
allowNonEmptyDirFlag) {
+  withTempDir { dir =>
+val tempLocation = dir.toURI.toString
+withTable("ctas1", "ctas_with_existing_location") {
+  if (!spark.conf.get(SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS)) {

Review comment:
   Done, changed type of val `allowNonEmptyDirFlag` to Boolean




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r632288726



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##
@@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with 
SQLTestUtils with TestHi
 }
   }
 
+  test("SPARK-28551: CTAS Hive Table should be with non-existent or empty 
location") {
+def executeCTASWithNonEmptyLocation(tempLocation: String) {
+  sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION 
'$tempLocation/ctas1'")
+  sql("INSERT INTO TABLE ctas1 SELECT 'A' ")
+  sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile 
LOCATION
+   |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, 
value""".stripMargin)
+}
+
+Seq("false", "true").foreach { convertCTASFlag =>

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r632288384



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##
@@ -96,4 +99,20 @@ object DataWritingCommand {
   sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
   metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: 
Configuration) {
+if (saveMode != SaveMode.Overwrite &&
+  !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) {
+  val filePath = new org.apache.hadoop.fs.Path(tablePath)
+  val fs = filePath.getFileSystem(hadoopConf)
+  if (fs.exists(filePath) &&
+fs.getFileStatus(filePath).isDirectory &&

Review comment:
   Done

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##
@@ -96,4 +99,20 @@ object DataWritingCommand {
   sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
   metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: 
Configuration) {
+if (saveMode != SaveMode.Overwrite &&
+  !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) {

Review comment:
   Done

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala
##
@@ -96,4 +99,20 @@ object DataWritingCommand {
   sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY),
   metrics.values.toSeq)
   }
+
+  def assertEmptyRootPath(tablePath: URI, saveMode: SaveMode, hadoopConf: 
Configuration) {
+if (saveMode != SaveMode.Overwrite &&
+  !SQLConf.get.allowCreateTableAsSelectNonEmptyLocation) {
+  val filePath = new org.apache.hadoop.fs.Path(tablePath)
+  val fs = filePath.getFileSystem(hadoopConf)
+  if (fs.exists(filePath) &&
+fs.getFileStatus(filePath).isDirectory &&
+fs.listStatus(filePath).length != 0) {
+throw new AnalysisException(
+  s"CREATE-TABLE-AS-SELECT cannot create table with location to a 
non-empty directory " +
+s"${tablePath} . To disable the behavior, " +

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.

2021-05-13 Thread GitBox


vinodkc commented on a change in pull request #32411:
URL: https://github.com/apache/spark/pull/32411#discussion_r632288269



##
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala
##
@@ -598,6 +598,37 @@ abstract class SQLQuerySuiteBase extends QueryTest with 
SQLTestUtils with TestHi
 }
   }
 
+  test("SPARK-28551: CTAS Hive Table should be with non-existent or empty 
location") {
+def executeCTASWithNonEmptyLocation(tempLocation: String) {
+  sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION 
'$tempLocation/ctas1'")
+  sql("INSERT INTO TABLE ctas1 SELECT 'A' ")
+  sql(s"""CREATE TABLE ctas_with_existing_location stored as rcfile 
LOCATION
+   |'$tempLocation' AS SELECT key k, value FROM src ORDER BY k, 
value""".stripMargin)
+}
+
+Seq("false", "true").foreach { convertCTASFlag =>
+  Seq("false", "true").foreach { allowNonEmptyDirFlag =>
+withSQLConf(
+  SQLConf.CONVERT_CTAS.key -> convertCTASFlag,
+  SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS.key -> 
allowNonEmptyDirFlag) {
+  withTempDir { dir =>
+val tempLocation = dir.toURI.toString
+withTable("ctas1", "ctas_with_existing_location") {
+  if (!spark.conf.get(SQLConf.ALLOW_NON_EMPTY_LOCATION_IN_CTAS)) {
+intercept[AnalysisException] {

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


SparkQA commented on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-841011518


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43062/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32315:
URL: https://github.com/apache/spark/pull/32315#issuecomment-840964887


   **[Test build #138537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138537/testReport)**
 for PR 32315 at commit 
[`6e70fdd`](https://github.com/apache/spark/commit/6e70fddc6928d6aa60ef6a66eccc7f95884ce752).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32315: [SPARK-35206][TESTS][SQL] Extract common used get project path into a function in SparkFunctionSuite

2021-05-13 Thread GitBox


SparkQA commented on pull request #32315:
URL: https://github.com/apache/spark/pull/32315#issuecomment-841008847


   **[Test build #138537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138537/testReport)**
 for PR 32315 at commit 
[`6e70fdd`](https://github.com/apache/spark/commit/6e70fddc6928d6aa60ef6a66eccc7f95884ce752).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on a change in pull request #32495: [SPARK-35363][SQL] Refactor sort merge join code-gen be agnostic to join type

2021-05-13 Thread GitBox


c21 commented on a change in pull request #32495:
URL: https://github.com/apache/spark/pull/32495#discussion_r632283178



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
##
@@ -418,115 +428,106 @@ case class SortMergeJoinExec(
 // Inline mutable state since not many join operations in a task
 val matches = ctx.addMutableState(clsName, "matches",
   v => s"$v = new $clsName($inMemoryThreshold, $spillThreshold);", 
forceInline = true)
-// Copy the left keys as class members so they could be used in next 
function call.
-val matchedKeyVars = copyKeys(ctx, leftKeyVars)
+// Copy the streamed keys as class members so they could be used in next 
function call.
+val matchedKeyVars = copyKeys(ctx, streamedKeyVars)
 
-ctx.addNewFunction("findNextInnerJoinRows",
+ctx.addNewFunction("findNextJoinRows",
   s"""
- |private boolean findNextInnerJoinRows(
- |scala.collection.Iterator leftIter,
- |scala.collection.Iterator rightIter) {
- |  $leftRow = null;
+ |private boolean findNextJoinRows(
+ |scala.collection.Iterator streamedIter,
+ |scala.collection.Iterator bufferedIter) {
+ |  $streamedRow = null;
  |  int comp = 0;
- |  while ($leftRow == null) {
- |if (!leftIter.hasNext()) return false;
- |$leftRow = (InternalRow) leftIter.next();
- |${leftKeyVars.map(_.code).mkString("\n")}
- |if ($leftAnyNull) {
- |  $leftRow = null;
+ |  while ($streamedRow == null) {
+ |if (!streamedIter.hasNext()) return false;
+ |$streamedRow = (InternalRow) streamedIter.next();
+ |${streamedKeyVars.map(_.code).mkString("\n")}
+ |if ($streamedAnyNull) {
+ |  $streamedRow = null;
  |  continue;
  |}
  |if (!$matches.isEmpty()) {
- |  ${genComparison(ctx, leftKeyVars, matchedKeyVars)}
+ |  ${genComparison(ctx, streamedKeyVars, matchedKeyVars)}
  |  if (comp == 0) {
  |return true;
  |  }
  |  $matches.clear();
  |}
  |
  |do {
- |  if ($rightRow == null) {
- |if (!rightIter.hasNext()) {
+ |  if ($bufferedRow == null) {
+ |if (!bufferedIter.hasNext()) {
  |  ${matchedKeyVars.map(_.code).mkString("\n")}
  |  return !$matches.isEmpty();
  |}
- |$rightRow = (InternalRow) rightIter.next();
- |${rightKeyTmpVars.map(_.code).mkString("\n")}
- |if ($rightAnyNull) {
- |  $rightRow = null;
+ |$bufferedRow = (InternalRow) bufferedIter.next();
+ |${bufferedKeyTmpVars.map(_.code).mkString("\n")}
+ |if ($bufferedAnyNull) {
+ |  $bufferedRow = null;
  |  continue;
  |}
- |${rightKeyVars.map(_.code).mkString("\n")}
+ |${bufferedKeyVars.map(_.code).mkString("\n")}
  |  }
- |  ${genComparison(ctx, leftKeyVars, rightKeyVars)}
+ |  ${genComparison(ctx, streamedKeyVars, bufferedKeyVars)}
  |  if (comp > 0) {
- |$rightRow = null;
+ |$bufferedRow = null;
  |  } else if (comp < 0) {
  |if (!$matches.isEmpty()) {
  |  ${matchedKeyVars.map(_.code).mkString("\n")}
  |  return true;
  |}
- |$leftRow = null;
+ |$streamedRow = null;
  |  } else {
- |$matches.add((UnsafeRow) $rightRow);
- |$rightRow = null;
+ |$matches.add((UnsafeRow) $bufferedRow);
+ |$bufferedRow = null;
  |  }
- |} while ($leftRow != null);
+ |} while ($streamedRow != null);
  |  }
  |  return false; // unreachable
  |}
""".stripMargin, inlineToOuterClass = true)

Review comment:
   @cloud-fan - thanks for calling it out. I would like to make it future 
proof by giving it a fresh name. Let me do it in a followup PR. Actually I was 
wondering the same question as you when implementing and spent some time to 
figuring it out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 commented on pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


c21 commented on pull request #32547:
URL: https://github.com/apache/spark/pull/32547#issuecomment-841007190


   cc @cloud-fan and @maropu to take a look when you have time, thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] c21 opened a new pull request #32547: [SPARK-35351][SQL] Add code-gen for left anti sort merge join

2021-05-13 Thread GitBox


c21 opened a new pull request #32547:
URL: https://github.com/apache/spark/pull/32547


   
   
   ### What changes were proposed in this pull request?
   
   As title. This PR is to add code-gen support for LEFT ANTI sort merge join. 
The main change is to extract `loadStreamed` in 
`SortMergeJoinExec.doProduce()`. That is to set all columns values for streamed 
row, when the streamed row has no output row.
   
   Example query:
   
   ```
   val df1 = spark.range(10).select($"id".as("k1"))
   val df2 = spark.range(4).select($"id".as("k2"))
   df1.join(df2.hint("SHUFFLE_MERGE"), $"k1" === $"k2", "left_anti")
   ```
   
   Example generated code:
   
   ```
   == Subtree 5 / 5 (maxMethodCodeSize:296; maxConstantPoolSize:156(0.24% 
used); numInnerClasses:0) ==
   *(5) Project [id#0L AS k1#2L]
   +- *(5) SortMergeJoin [id#0L], [k2#6L], LeftAnti
  :- *(2) Sort [id#0L ASC NULLS FIRST], false, 0
  :  +- Exchange hashpartitioning(id#0L, 5), ENSURE_REQUIREMENTS, [id=#27]
  : +- *(1) Range (0, 10, step=1, splits=2)
  +- *(4) Sort [k2#6L ASC NULLS FIRST], false, 0
 +- Exchange hashpartitioning(k2#6L, 5), ENSURE_REQUIREMENTS, [id=#33]
+- *(3) Project [id#4L AS k2#6L]
   +- *(3) Range (0, 4, step=1, splits=2)
   
   Generated code:
   /* 001 */ public Object generate(Object[] references) {
   /* 002 */   return new GeneratedIteratorForCodegenStage5(references);
   /* 003 */ }
   /* 004 */
   /* 005 */ // codegenStageId=5
   /* 006 */ final class GeneratedIteratorForCodegenStage5 extends 
org.apache.spark.sql.execution.BufferedRowIterator {
   /* 007 */   private Object[] references;
   /* 008 */   private scala.collection.Iterator[] inputs;
   /* 009 */   private scala.collection.Iterator smj_streamedInput_0;
   /* 010 */   private scala.collection.Iterator smj_bufferedInput_0;
   /* 011 */   private InternalRow smj_streamedRow_0;
   /* 012 */   private InternalRow smj_bufferedRow_0;
   /* 013 */   private long smj_value_2;
   /* 014 */   private 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray smj_matches_0;
   /* 015 */   private long smj_value_3;
   /* 016 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] 
smj_mutableStateArray_0 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[2];
   /* 017 */
   /* 018 */   public GeneratedIteratorForCodegenStage5(Object[] references) {
   /* 019 */ this.references = references;
   /* 020 */   }
   /* 021 */
   /* 022 */   public void init(int index, scala.collection.Iterator[] inputs) {
   /* 023 */ partitionIndex = index;
   /* 024 */ this.inputs = inputs;
   /* 025 */ smj_streamedInput_0 = inputs[0];
   /* 026 */ smj_bufferedInput_0 = inputs[1];
   /* 027 */
   /* 028 */ smj_matches_0 = new 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArray(1, 2147483647);
   /* 029 */ smj_mutableStateArray_0[0] = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0);
   /* 030 */ smj_mutableStateArray_0[1] = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0);
   /* 031 */
   /* 032 */   }
   /* 033 */
   /* 034 */   private boolean findNextJoinRows(
   /* 035 */ scala.collection.Iterator streamedIter,
   /* 036 */ scala.collection.Iterator bufferedIter) {
   /* 037 */ smj_streamedRow_0 = null;
   /* 038 */ int comp = 0;
   /* 039 */ while (smj_streamedRow_0 == null) {
   /* 040 */   if (!streamedIter.hasNext()) return false;
   /* 041 */   smj_streamedRow_0 = (InternalRow) streamedIter.next();
   /* 042 */   long smj_value_0 = smj_streamedRow_0.getLong(0);
   /* 043 */   if (false) {
   /* 044 */ if (!smj_matches_0.isEmpty()) {
   /* 045 */   smj_matches_0.clear();
   /* 046 */ }
   /* 047 */ return false;
   /* 048 */
   /* 049 */   }
   /* 050 */   if (!smj_matches_0.isEmpty()) {
   /* 051 */ comp = 0;
   /* 052 */ if (comp == 0) {
   /* 053 */   comp = (smj_value_0 > smj_value_3 ? 1 : smj_value_0 < 
smj_value_3 ? -1 : 0);
   /* 054 */ }
   /* 055 */
   /* 056 */ if (comp == 0) {
   /* 057 */   return true;
   /* 058 */ }
   /* 059 */ smj_matches_0.clear();
   /* 060 */   }
   /* 061 */
   /* 062 */   do {
   /* 063 */ if (smj_bufferedRow_0 == null) {
   /* 064 */   if (!bufferedIter.hasNext()) {
   /* 065 */ smj_value_3 = smj_value_0;
   /* 066 */ return !smj_matches_0.isEmpty();
   /* 067 */   }
   /* 068 */   smj_bufferedRow_0 = (InternalRow) bufferedIter.next();
   /* 069 */   long smj_value_1 = smj_bufferedRow_0.getLong(0);
   /* 070 */   if (false) {
   /* 071 */ smj_bufferedRow_0 = null;
   /* 072 */ continue;
   /* 073 */   }
   /* 074 */   smj_value_2 = smj_value_1;
   /* 075 */ }
   /* 076 */
   /* 

[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


beliefer commented on a change in pull request #32513:
URL: https://github.com/apache/spark/pull/32513#discussion_r632281901



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##
@@ -71,7 +72,12 @@ class QueryExecution(
 
   lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) 
{
 // We can't clone `logical` here, which will reset the `_analyzed` flag.
-sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) 
transform {
+  // SPARK-35378: Eagerly execute LeafRunnableCommand so that query 
command with CTE
+  case r: LeafRunnableCommand =>

Review comment:
   OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] itholic opened a new pull request #32546: [SPARK-35395][DOCS] Move ORC data source options from Python and Scala into a single page

2021-05-13 Thread GitBox


itholic opened a new pull request #32546:
URL: https://github.com/apache/spark/pull/32546


   ### What changes were proposed in this pull request?
   
   This PR proposes move ORC data source options from Python, Scala and Java 
into a single page.
   
   ### Why are the changes needed?
   
   So far, the documentation for ORC data source options is separated into 
different pages for each language API documents. However, this makes managing 
many options inconvenient, so it is efficient to manage all options in a single 
page and provide a link to that page in the API of each language.
   
   ### Does this PR introduce _any_ user-facing change?
   
   The "Data Source Option" page is added to the "ORC Files" page, and linked 
to the each API docs.
   
   https://user-images.githubusercontent.com/44108233/118223516-6ec22280-b4bc-11eb-8281-847546452863.png;>
   
   
   ### How was this patch tested?
   
   Manually build docs and confirm the page.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32501: [SPARK-35359][SQL] Insert data with char/varchar datatype will fail when data length exceed length limitation

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32501:
URL: https://github.com/apache/spark/pull/32501#discussion_r632281269



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/util/PartitioningUtils.scala
##
@@ -72,12 +73,20 @@ private[sql] object PartitioningUtils {
 
   private def charTypeWriteSideCheck(inputStr: String, limit: Int): String = {
 val toUtf8 = UTF8String.fromString(inputStr)
-CharVarcharCodegenUtils.charTypeWriteSideCheck(toUtf8, limit).toString
+if (SQLConf.get.charVarcharAsString) {

Review comment:
   can we move the check to `normalizePartitionSpec`?
   ```
   val normalizedVal = if (SQLConf.get.charVarcharAsString) value else 
normalizedFiled.dataType match {
 ...
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


cloud-fan commented on a change in pull request #32513:
URL: https://github.com/apache/spark/pull/32513#discussion_r632278890



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##
@@ -71,7 +72,12 @@ class QueryExecution(
 
   lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) 
{
 // We can't clone `logical` here, which will reset the `_analyzed` flag.
-sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) 
transform {
+  // SPARK-35378: Eagerly execute LeafRunnableCommand so that query 
command with CTE
+  case r: LeafRunnableCommand =>

Review comment:
   we should run all `Command`s, not just `LeafRunnableCommand`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-840916339


   **[Test build #138530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138530/testReport)**
 for PR 32161 at commit 
[`c50df9a`](https://github.com/apache/spark/commit/c50df9afe801a1346d3d581cdcb8d35bdf5c643c).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-841003436


   **[Test build #138530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138530/testReport)**
 for PR 32161 at commit 
[`c50df9a`](https://github.com/apache/spark/commit/c50df9afe801a1346d3d581cdcb8d35bdf5c643c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class AgeExample(birthday: Expression, child: Expression) extends 
RuntimeReplaceable `
 * `class SessionExtensionsWithLoader extends 
SparkSessionExtensionsProvider `
 * `class SessionExtensionsWithoutLoader extends 
SparkSessionExtensionsProvider `
 * `case class AvroWrite(`
 * `case class KafkaWrite(`
 * `case class TryEval(child: Expression) extends UnaryExpression with 
NullIntolerant `
 * `case class TryAdd(left: Expression, right: Expression, child: 
Expression)`
 * `case class TryDivide(left: Expression, right: Expression, child: 
Expression)`
 * `new RuntimeException(s\"Failed to convert value $value (class of 
$cls) \" +`
 * `trait FileWrite extends Write `
 * `case class CSVWrite(`
 * `case class JsonWrite(`
 * `case class OrcWrite(`
 * `case class ParquetWrite(`
 * `case class TextWrite(`
 * `trait ShuffledJoin extends JoinCodegenSupport `
 * `class ConsoleWrite(schema: StructType, options: 
CaseInsensitiveStringMap)`
 * `class ForeachWrite[T](`
 * `class MemoryWrite(sink: MemorySink, schema: StructType, needTruncate: 
Boolean) extends Write `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.

2021-05-13 Thread GitBox


SparkQA commented on pull request #32161:
URL: https://github.com/apache/spark/pull/32161#issuecomment-841002776


   **[Test build #138545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138545/testReport)**
 for PR 32161 at commit 
[`883aacf`](https://github.com/apache/spark/commit/883aacf94044c1b32648d6dab0ed6fc10048b580).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29642: [SPARK-32792][SQL] Improve Parquet In filter pushdown

2021-05-13 Thread GitBox


cloud-fan commented on pull request #29642:
URL: https://github.com/apache/spark/pull/29642#issuecomment-841002294


   Yea these benchmark results are not updated in time. Let's post the 
benchmark result before and after this PR in the PR description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan edited a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)

2021-05-13 Thread GitBox


cloud-fan edited a comment on pull request #32496:
URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841001172


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138543/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-840998962


   **[Test build #138543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)**
 for PR 32513 at commit 
[`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)

2021-05-13 Thread GitBox


cloud-fan closed pull request #32496:
URL: https://github.com/apache/spark/pull/32496


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan edited a comment on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)

2021-05-13 Thread GitBox


cloud-fan edited a comment on pull request #32496:
URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194


   thanks, merging to master/3.1!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #32496: [SPARK-35207][SQL] Normalize hash function behavior with negative zero (floating point types)

2021-05-13 Thread GitBox


cloud-fan commented on pull request #32496:
URL: https://github.com/apache/spark/pull/32496#issuecomment-841001194


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841001172


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138543/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841001158


   **[Test build #138543 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)**
 for PR 32513 at commit 
[`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841000797


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43064/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841000788


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43064/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-841000797


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43064/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32478: [SPARK-35063][SQL] Group exception messages in sql/catalyst

2021-05-13 Thread GitBox


SparkQA commented on pull request #32478:
URL: https://github.com/apache/spark/pull/32478#issuecomment-840999011


   **[Test build #138544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138544/testReport)**
 for PR 32478 at commit 
[`2214608`](https://github.com/apache/spark/commit/221460890e99371fc61443b0fdc7bcd108166c5a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute LeafRunnableCommand so that query command with CTE

2021-05-13 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-840998962


   **[Test build #138543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138543/testReport)**
 for PR 32513 at commit 
[`a82ed76`](https://github.com/apache/spark/commit/a82ed7677eee8f1a5a2be3dae454e9a7fc7f2c56).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-840964019


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


SparkQA commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-840998927


   **[Test build #138542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138542/testReport)**
 for PR 32542 at commit 
[`0da0219`](https://github.com/apache/spark/commit/0da02192657490c2fd06e489731e02a58562ec08).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


SparkQA commented on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-840998950


   **[Test build #138541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138541/testReport)**
 for PR 32543 at commit 
[`b5bf370`](https://github.com/apache/spark/commit/b5bf37016d169675048091e4794de70c3f82a15a).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


SparkQA commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-840998906


   **[Test build #138540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138540/testReport)**
 for PR 32544 at commit 
[`a1ee451`](https://github.com/apache/spark/commit/a1ee451155074e2c3c0e534d908eaa24aa38093d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-840998180


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32541:
URL: https://github.com/apache/spark/pull/32541#issuecomment-840998179


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32545: [MINOR][DOC] ADD toc for monitoring page

2021-05-13 Thread GitBox


AmplabJenkins removed a comment on pull request #32545:
URL: https://github.com/apache/spark/pull/32545#issuecomment-840998177


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43059/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32541:
URL: https://github.com/apache/spark/pull/32541#issuecomment-840998179


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138534/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32545: [MINOR][DOC] ADD toc for monitoring page

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32545:
URL: https://github.com/apache/spark/pull/32545#issuecomment-840998177


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43059/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32542: [WIP][SPARK-35403][SQL] Migrate ALTER TABLE commands that alter columns to use UnresolvedTable to resolve the identifier

2021-05-13 Thread GitBox


AmplabJenkins commented on pull request #32542:
URL: https://github.com/apache/spark/pull/32542#issuecomment-840998180


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138533/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter

2021-05-13 Thread GitBox


SparkQA removed a comment on pull request #32541:
URL: https://github.com/apache/spark/pull/32541#issuecomment-840934589


   **[Test build #138534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138534/testReport)**
 for PR 32541 at commit 
[`69a08ac`](https://github.com/apache/spark/commit/69a08ac03e68ed26b19faa4ba2d34fb3d8a486d9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter

2021-05-13 Thread GitBox


SparkQA commented on pull request #32541:
URL: https://github.com/apache/spark/pull/32541#issuecomment-840996613


   **[Test build #138534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138534/testReport)**
 for PR 32541 at commit 
[`69a08ac`](https://github.com/apache/spark/commit/69a08ac03e68ed26b19faa4ba2d34fb3d8a486d9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32545: [MINOR][DOC] ADD toc for monitoring page

2021-05-13 Thread GitBox


SparkQA commented on pull request #32545:
URL: https://github.com/apache/spark/pull/32545#issuecomment-840995308






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a change in pull request #32539: [SPARK-35402][WebUI] Make thread pool of jetty server in Web UIs configurable

2021-05-13 Thread GitBox


yaooqinn commented on a change in pull request #32539:
URL: https://github.com/apache/spark/pull/32539#discussion_r632269142



##
File path: core/src/main/scala/org/apache/spark/internal/config/UI.scala
##
@@ -218,4 +218,13 @@ private[spark] object UI {
 .stringConf
 .transform(_.toUpperCase(Locale.ROOT))
 .createWithDefault("LOCAL")
+
+  val UI_THREADS = ConfigBuilder("spark.ui.threads")

Review comment:
   On the other hand, the default 200 might be easy to hit 
   
![image](https://user-images.githubusercontent.com/8326978/118219985-294a2900-b4ad-11eb-876b-0928baa8ea58.png)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Convert LeafRunnableCommand to LocalRelation when query with CTE

2021-05-13 Thread GitBox


beliefer commented on a change in pull request #32513:
URL: https://github.com/apache/spark/pull/32513#discussion_r632268442



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##
@@ -71,7 +72,15 @@ class QueryExecution(
 
   lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) 
{
 // We can't clone `logical` here, which will reset the `_analyzed` flag.
-sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) match 
{
+  case c: Command => c

Review comment:
   OK. Let's unify the behavior eagerly execute the commands.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32541: [SPARK-34764][UI][FOLLOW-UP] Fix indentation and missing arguments for JavaScript linter

2021-05-13 Thread GitBox


HyukjinKwon commented on pull request #32541:
URL: https://github.com/apache/spark/pull/32541#issuecomment-840990663


   Thanks all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a change in pull request #32513: [SPARK-35378][SQL] Convert LeafRunnableCommand to LocalRelation when query with CTE

2021-05-13 Thread GitBox


beliefer commented on a change in pull request #32513:
URL: https://github.com/apache/spark/pull/32513#discussion_r632267184



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
##
@@ -71,7 +72,15 @@ class QueryExecution(
 
   lazy val analyzed: LogicalPlan = executePhase(QueryPlanningTracker.ANALYSIS) 
{
 // We can't clone `logical` here, which will reset the `_analyzed` flag.
-sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker)
+sparkSession.sessionState.analyzer.executeAndCheck(logical, tracker) match 
{
+  case c: Command => c

Review comment:
   I want to maintain the Command here and ensure its behavior.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LittleCuteBug commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


LittleCuteBug commented on a change in pull request #32544:
URL: https://github.com/apache/spark/pull/32544#discussion_r632266784



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
##
@@ -138,11 +138,12 @@ case class BroadcastExchangeExec(
   s"type: ${relation.getClass.getName}")
 }
 
-longMetric("dataSize") += dataSize
-if (dataSize >= MAX_BROADCAST_TABLE_BYTES) {
-  throw new SparkException(
-s"Cannot broadcast the table that is larger than 8GB: 
${dataSize >> 30} GB")
-}
+longMetric("dataSize") += dataSize

Review comment:
   Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me

2021-05-13 Thread GitBox


LuciferYang commented on a change in pull request #32536:
URL: https://github.com/apache/spark/pull/32536#discussion_r632263291



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging {
*/
   private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): 
ByteCodeStats = {
 // First retrieve the generated classes.
-val classes = {
-  val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc")
-  scField.setAccessible(true)
-  val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler]
-  val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader]
-  val classesField = loader.getClass.getDeclaredField("classes")
-  classesField.setAccessible(true)
-  classesField.get(loader).asInstanceOf[JavaMap[String, 
Array[Byte]]].asScala
-}
+val classes = evaluator.getBytecodes.asScala

Review comment:
   @maropu If we want to do some code checking, maybe we can enhanced the 
case `metrics are recorded on compile` in `CodeGenerationSuite` as follows:
   ```
   test("metrics are recorded on compile") {
   ...
   val metricGeneratedClassByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
   val metricGeneratedMethodByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
   GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil)
   ...
   // Make sure the stat content doesn't change before/after SPARK-35398
   
assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedClassByteCodeSizeSnapshot)
 .sameElements(Array(740, 1293)))
   
assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedMethodByteCodeSizeSnapshot)
 .sameElements(Array(5, 5, 6, 7, 10, 15, 46)))
 }
   ```
   
   The new assertion can be passed before and after this pr, however, if we 
update the version of janino or change the codegen of `Add`, we may need to 
update the content of the assertion because the size of the generated code may 
change.
   For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with 
janino 3.1.4 are `Array(740, 1293)`, but with  janino 3.0.16 are `Array(687, 
1036)`, so I'm not sure if we need to add these assertions in this pr.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me

2021-05-13 Thread GitBox


LuciferYang commented on a change in pull request #32536:
URL: https://github.com/apache/spark/pull/32536#discussion_r632263291



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging {
*/
   private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): 
ByteCodeStats = {
 // First retrieve the generated classes.
-val classes = {
-  val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc")
-  scField.setAccessible(true)
-  val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler]
-  val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader]
-  val classesField = loader.getClass.getDeclaredField("classes")
-  classesField.setAccessible(true)
-  classesField.get(loader).asInstanceOf[JavaMap[String, 
Array[Byte]]].asScala
-}
+val classes = evaluator.getBytecodes.asScala

Review comment:
   @maropu If we want to do some code checking, maybe we can enhanced the 
case `metrics are recorded on compile` in `CodeGenerationSuite` as follows:
   ```
   test("metrics are recorded on compile") {
   ...
   val metricGeneratedClassByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
   val metricGeneratedMethodByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
   GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil)
   ...
   // Make sure the stat content doesn't change before/after SPARK-35398
   
assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedClassByteCodeSizeSnapshot)
 .sameElements(Array(740, 1293)))
   
assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedMethodByteCodeSizeSnapshot)
 .sameElements(Array(5, 5, 6, 7, 10, 15, 46)))
 }
   ```
   
   The new assertion can be passed before and after this pr, however, if we 
update the version of janino or change the codegen of `Add`, we may need to 
update the content of the assertion because the size of the generated code may 
change.
   For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with 
janino 3.1.4 are `Array(740, 1293)`, but withe  janino 3.0.16 are `Array(687, 
1036)`, so I'm not sure if we need to add these assertions in this pr.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on a change in pull request #32536: [SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in `CodeGenerator.updateAndGetCompilationStats` me

2021-05-13 Thread GitBox


LuciferYang commented on a change in pull request #32536:
URL: https://github.com/apache/spark/pull/32536#discussion_r632263291



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
##
@@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging {
*/
   private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): 
ByteCodeStats = {
 // First retrieve the generated classes.
-val classes = {
-  val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc")
-  scField.setAccessible(true)
-  val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler]
-  val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader]
-  val classesField = loader.getClass.getDeclaredField("classes")
-  classesField.setAccessible(true)
-  classesField.get(loader).asInstanceOf[JavaMap[String, 
Array[Byte]]].asScala
-}
+val classes = evaluator.getBytecodes.asScala

Review comment:
   @maropu If we want to do some code checking, maybe we can can be 
enhanced the case `metrics are recorded on compile` in `CodeGenerationSuite` as 
follows:
   ```
   test("metrics are recorded on compile") {
   ...
   val metricGeneratedClassByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
   val metricGeneratedMethodByteCodeSizeSnapshot =
 
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
   GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil)
   ...
   // Make sure the stat content doesn't change before/after SPARK-35398
   
assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedClassByteCodeSizeSnapshot)
 .sameElements(Array(740, 1293)))
   
assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
 .diff(metricGeneratedMethodByteCodeSizeSnapshot)
 .sameElements(Array(5, 5, 6, 7, 10, 15, 46)))
 }
   ```
   
   The new assertion can be passed before and after this pr, however, if we 
update the version of janino or change the codegen of `Add`, we may need to 
update the content of the assertion because the size of the generated code may 
change.
   For example `CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE` with 
janino 3.1.4 are `Array(740, 1293)`, but withe  janino 3.0.16 are `Array(687, 
1036)`, so I'm not sure if we need to add these assertions in this pr.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #32543: [SPARK-35332][SQL][FOLLOWUP] Refine wrong comment

2021-05-13 Thread GitBox


srowen commented on pull request #32543:
URL: https://github.com/apache/spark/pull/32543#issuecomment-840986837


   Jenkins retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


srowen commented on a change in pull request #32544:
URL: https://github.com/apache/spark/pull/32544#discussion_r632261618



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
##
@@ -138,11 +138,12 @@ case class BroadcastExchangeExec(
   s"type: ${relation.getClass.getName}")
 }
 
-longMetric("dataSize") += dataSize
-if (dataSize >= MAX_BROADCAST_TABLE_BYTES) {
-  throw new SparkException(
-s"Cannot broadcast the table that is larger than 8GB: 
${dataSize >> 30} GB")
-}
+longMetric("dataSize") += dataSize

Review comment:
   Fix the indentation though




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sumeetgajjar commented on pull request #32114: [SPARK-35011][CORE] Avoid Block Manager registrations when StopExecutor msg is in-flight

2021-05-13 Thread GitBox


sumeetgajjar commented on pull request #32114:
URL: https://github.com/apache/spark/pull/32114#issuecomment-840986610


   Hi @mridulm @Ngone51 @attilapiros ,
   I have updated the PR with the solution that Attila proposed. Could you 
please review it once you are free?
   Apologies for the delay, I was fighting fires of my own. 
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #32544: [SPARK-32484][SQL] Fix log info BroadcastExchangeExec.scala

2021-05-13 Thread GitBox


srowen commented on pull request #32544:
URL: https://github.com/apache/spark/pull/32544#issuecomment-840986580


   Jenkins test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >