[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-22 Thread GitBox


SparkQA commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-804640082


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40964/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


MaxGekk commented on a change in pull request #31938:
URL: https://github.com/apache/spark/pull/31938#discussion_r599288224



##
File path: docs/cloud-integration.md
##
@@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1.
 Here is the documentation on the standard connectors both from Apache and the 
cloud providers.
 
 * [OpenStack 
Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html).
-* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html).

Review comment:
   I would prefer third one - both links mentioned separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


SparkQA commented on pull request #31933:
URL: https://github.com/apache/spark/pull/31933#issuecomment-804637310


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40963/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general metho

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804549509


   **[Test build #136378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)**
 for PR 29754 at commit 
[`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804636621


   **[Test build #136378 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)**
 for PR 29754 at commit 
[`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lenadroid commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


lenadroid commented on a change in pull request #31938:
URL: https://github.com/apache/spark/pull/31938#discussion_r599280673



##
File path: docs/cloud-integration.md
##
@@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1.
 Here is the documentation on the standard connectors both from Apache and the 
cloud providers.
 
 * [OpenStack 
Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html).
-* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html).

Review comment:
   Thanks for the feedback! Yes, the link I provided is for ABFS & Data 
Lake Gen 2 specifically. Which option would you prefer:
   
   1. Change the link text to say "Azure Blob Storage" and point to 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html 
   2. Change the link text to say "Azure Blob Filesystem and Azure Data Lake 
Gen 2" and point to 
https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html
   3. Have both links mentioned above available.
   
   Let me know which one you prefer and I'll make a change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804625250


   **[Test build #136383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136383/testReport)**
 for PR 31102 at commit 
[`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


MaxGekk commented on a change in pull request #31938:
URL: https://github.com/apache/spark/pull/31938#discussion_r599277581



##
File path: docs/cloud-integration.md
##
@@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1.
 Here is the documentation on the standard connectors both from Apache and the 
cloud providers.
 
 * [OpenStack 
Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html).
-* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html).
+* [Azure Blob Storage and Azure Datalake Gen 
2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html).

Review comment:
   The link is specifically for ABFS actually. Should we provide the link 
for `Azure Blob Storage`: 
https://hadoop.apache.org/docs/current/hadoop-azure/index.html ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Ngone51 commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


Ngone51 commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804624719


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


HyukjinKwon commented on pull request #31938:
URL: https://github.com/apache/spark/pull/31938#issuecomment-804624501


   @steveloughran fyi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804621698


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136371/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31938: Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31938:
URL: https://github.com/apache/spark/pull/31938#issuecomment-804622828


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] lenadroid opened a new pull request #31938: Updating the link for Azure Data Lake Gen 2 in docs

2021-03-22 Thread GitBox


lenadroid opened a new pull request #31938:
URL: https://github.com/apache/spark/pull/31938


   Current link for `Azure Blob Storage and Azure Datalake Gen 2` leads to AWS 
information. Replacing the link to point to the right page.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804621698


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136371/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side

2021-03-22 Thread GitBox


dongjoon-hyun commented on pull request #31936:
URL: https://github.com/apache/spark/pull/31936#issuecomment-804621313


   Thank you for pining me, @xkrogen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804620821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31919: [SPARK-34087][FOLLOW-UP][SQL] Manage ExecutionListenerBus register inside itself

2021-03-22 Thread GitBox


SparkQA commented on pull request #31919:
URL: https://github.com/apache/spark/pull/31919#issuecomment-804621221


   **[Test build #136382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136382/testReport)**
 for PR 31919 at commit 
[`ae5d5d6`](https://github.com/apache/spark/commit/ae5d5d669d290de486a4ba473505a753263fb993).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804621166


   **[Test build #136381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136381/testReport)**
 for PR 31937 at commit 
[`70bf13e`](https://github.com/apache/spark/commit/70bf13e89c0bcdcede7f6004d34062800480ea9f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804508390


   **[Test build #136371 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136371/testReport)**
 for PR 31937 at commit 
[`724557a`](https://github.com/apache/spark/commit/724557a43e40cf4e0c1c4456a79164a9ac24b6eb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804620822






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804621041


   **[Test build #136371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136371/testReport)**
 for PR 31937 at commit 
[`724557a`](https://github.com/apache/spark/commit/724557a43e40cf4e0c1c4456a79164a9ac24b6eb).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804620826






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804620821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136375/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on

2021-03-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r599273586



##
File path: docs/running-on-yarn.md
##
@@ -761,8 +761,27 @@ The following extra configuration options are available 
when the shuffle service
 NodeManagers where the Spark Shuffle Service is not running.
   
 
+
+  spark.yarn.shuffle.service.metrics.namespace
+  sparkShuffleService
+  
+The namespace to use when emitting shuffle service metrics into Hadoop 
metrics2 system of the
+NodeManager.

Review comment:
   Could you add some description about the limitation with old Hadoop 
versions (like 2.7.x)? Here or at Section `Running multiple versions of the 
Spark Shuffle Service`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-22 Thread GitBox


SparkQA commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-804620342


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40964/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on

2021-03-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r599272016



##
File path: 
resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleIntegrationSuite.scala
##
@@ -109,6 +110,59 @@ class YarnShuffleAuthSuite extends 
YarnShuffleIntegrationSuite {
 
 }
 
+/**
+ * SPARK-34828: Integration test for the external shuffle service with an 
alternate name and
+ * configs (by using a configuration overlay)
+ */
+@ExtendedYarnTest
+class YarnShuffleAlternateNameConfigSuite extends YarnShuffleIntegrationSuite {

Review comment:
   Please make this new test suite as a separate file.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on

2021-03-22 Thread GitBox


dongjoon-hyun commented on a change in pull request #31936:
URL: https://github.com/apache/spark/pull/31936#discussion_r599271678



##
File path: docs/running-on-yarn.md
##
@@ -811,3 +830,52 @@ do the following:
   to the list of filters in the spark.ui.filters configuration.
 
 Be aware that the history server information may not be up-to-date with the 
application's state.
+
+# Running multiple versions of the Spark Shuffle Service
+
+In some cases it may be desirable to run multiple instances of the Spark 
Shuffle Service which are
+using different versions of Spark. This can be helpful, for example, when 
running a YARN cluster
+with a mixed workload of applications running multiple Spark versions, since a 
given version of
+the shuffle service is not always compatible with other versions of Spark. 
YARN versions since 2.9.0
+support the ability to run shuffle services within an isolated classloader
+(see [YARN-4577](https://issues.apache.org/jira/browse/YARN-4577)), meaning 
multiple Spark versions
+can coexist within a single NodeManager. The
+`yarn.nodemanager.aux-services..classpath` and, starting from 
YARN 2.10.2/3.1.1/3.2.0,
+`yarn.nodemanager.aux-services..remote-classpath` options can be 
used to configure
+this. In addition to setting up separate classpaths, it's necessary to ensure 
the two versions
+advertise to different ports. This can be achieved using the 
`spark-shuffle-site.xml` file described
+above. For example, you may have configuration like:
+
+```properties
+  yarn.nodemanager.aux-services = spark_shuffle_x,spark_shuffle_y
+  yarn.nodemanager.aux-services.spark_shuffle_x.classpath = 
/path/to/spark-x-yarn-shuffle.jar,/path/to/spark-x-config
+  yarn.nodemanager.aux-services.spark_shuffle_y.classpath = 
/path/to/spark-y-yarn-shuffle.jar,/path/to/spark-y-config
+```
+
+The two `spark-*-config` directories each contain one file, 
`spark-shuffle-site.xml`. These are XML
+files in the [Hadoop Configuration 
format](https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/conf/Configuration.html)

Review comment:
   Shall we reference Apache Hadoop 3.2.2 doc instead of 3.2.0 because we 
are using Apache Spark 3.2.2?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


SparkQA commented on pull request #31933:
URL: https://github.com/apache/spark/pull/31933#issuecomment-804617321


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40963/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804616758


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40962/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general metho

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804528702


   **[Test build #136373 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136373/testReport)**
 for PR 29754 at commit 
[`15c63c3`](https://github.com/apache/spark/commit/15c63c3b7964982a468b50fad5768bf0ea612fe2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804616427


   **[Test build #136373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136373/testReport)**
 for PR 29754 at commit 
[`15c63c3`](https://github.com/apache/spark/commit/15c63c3b7964982a468b50fad5768bf0ea612fe2).
* This patch **fails SparkR unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804488767


   **[Test build #136370 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)**
 for PR 31476 at commit 
[`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804616183


   **[Test build #136370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)**
 for PR 31476 at commit 
[`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


cloud-fan commented on a change in pull request #31933:
URL: https://github.com/apache/spark/pull/31933#discussion_r599269087



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
##
@@ -62,15 +62,17 @@ case class CreateViewCommand(
 comment: Option[String],
 properties: Map[String, String],
 originalText: Option[String],
-child: LogicalPlan,
+analyzedPlan: LogicalPlan,

Review comment:
   how do we analyze it since it's not a child?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


cloud-fan commented on a change in pull request #31933:
URL: https://github.com/apache/spark/pull/31933#discussion_r599268594



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala
##
@@ -94,14 +94,19 @@ case class CacheTableAsSelectExec(
   override lazy val relationName: String = tempViewName
 
   override lazy val planToCache: LogicalPlan = {
+// If the plan cannot be analyzed, throw an exception and don't proceed.
+val qe = sparkSession.sessionState.executePlan(query)
+qe.assertAnalyzed()
+val analyzedPlan = qe.analyzed

Review comment:
   The current code looks fine.  I think `CacheTableAsSelectExec` is the 
only exception that it has a `query` which is not a simple table relation but 
we want to skip optimizing it. Let's document this clearly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804485460


   **[Test build #136369 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)**
 for PR 31476 at commit 
[`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804614005


   **[Test build #136369 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)**
 for PR 31476 at commit 
[`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


cloud-fan commented on a change in pull request #31933:
URL: https://github.com/apache/spark/pull/31933#discussion_r599266970



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -28,6 +28,7 @@ trait Command extends LogicalPlan {
   override def output: Seq[Attribute] = Seq.empty
   override def producedAttributes: AttributeSet = outputSet
   override def children: Seq[LogicalPlan] = Seq.empty
+  def plansToCheckAnalysis: Seq[LogicalPlan] = Seq.empty

Review comment:
   can we reuse `innerChildren`?

##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala
##
@@ -28,6 +28,7 @@ trait Command extends LogicalPlan {
   override def output: Seq[Attribute] = Seq.empty
   override def producedAttributes: AttributeSet = outputSet
   override def children: Seq[LogicalPlan] = Seq.empty
+  def plansToCheckAnalysis: Seq[LogicalPlan] = Seq.empty

Review comment:
   can we reuse `innerChildren` instead of adding this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804529725


   **[Test build #136375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)**
 for PR 31102 at commit 
[`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804611618


   **[Test build #136375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)**
 for PR 31102 at commit 
[`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804606349


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40962/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #31900: [MINOR][DOCS] Update sql-ref-syntax-dml-insert-into.md

2021-03-22 Thread GitBox


HyukjinKwon commented on pull request #31900:
URL: https://github.com/apache/spark/pull/31900#issuecomment-804605634


   It would be great if we can fix PR description though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


imback82 commented on a change in pull request #31933:
URL: https://github.com/apache/spark/pull/31933#discussion_r599257472



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
##
@@ -167,6 +167,9 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog {
   case _: ShowTableExtended =>
 throw new AnalysisException("SHOW TABLE EXTENDED is not supported for 
v2 tables.")
 
+  case c: Command =>
+c.plansToCheckAnalysis.foreach(checkAnalysis)

Review comment:
   This seems hacky? But we cannot make the analyzed plan as `children` for 
`CreateViewCommaned`. The reason is that the `View` will be optimized away (in 
the optimizer), and the verification that checks if a permanent view references 
temp views will fail.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804602571


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-22 Thread GitBox


SparkQA commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-804602562


   **[Test build #136380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136380/testReport)**
 for PR 31355 at commit 
[`7f6e82d`](https://github.com/apache/spark/commit/7f6e82de5a63750c4c5f210f4000ae1423007d3e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804602571


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804602548


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


SparkQA commented on pull request #31933:
URL: https://github.com/apache/spark/pull/31933#issuecomment-804602370


   **[Test build #136379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136379/testReport)**
 for PR 31933 at commit 
[`6fdd9e0`](https://github.com/apache/spark/commit/6fdd9e0edc924679420d3c64472b72e83bb6006f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand

2021-03-22 Thread GitBox


imback82 commented on a change in pull request #31933:
URL: https://github.com/apache/spark/pull/31933#discussion_r599256804



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
##
@@ -62,15 +62,17 @@ case class CreateViewCommand(
 comment: Option[String],
 properties: Map[String, String],
 originalText: Option[String],
-child: LogicalPlan,
+analyzedPlan: LogicalPlan,
 allowExisting: Boolean,
 replace: Boolean,
 viewType: ViewType)
   extends RunnableCommand {
 
   import ViewHelper._
 
-  override def innerChildren: Seq[QueryPlan[_]] = Seq(child)
+  override def plansToCheckAnalysis: Seq[LogicalPlan] = Seq(analyzedPlan)

Review comment:
   We need to run checkAnalysis on the analyzed plan, otherwise, for the 
following:
   ```
   sql("CREATE TABLE view_base_table (key int, data varchar(20)) USING PARQUET")
   sql("CREATE VIEW key_dependent_view AS SELECT * FROM view_base_table GROUP 
BY key")
   ```
   , view creation works fine, whereas it should have failed with:
   ```
   org.apache.spark.sql.AnalysisException
   expression 'spark_catalog.default.view_base_table.data' is neither present 
in the group by, nor is it an aggregate function. Add to group by or wrap in 
first() (or first_value) if you don't care which value you get.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804602129


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40961/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804602129


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40961/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


SparkQA commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804602114


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40961/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804601486


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804601485


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136367/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804601486


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804601485


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136367/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


SparkQA commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804600544


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40961/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804600499


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804599872


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40960/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write

2021-03-22 Thread GitBox


HeartSaVioR commented on pull request #31355:
URL: https://github.com/apache/spark/pull/31355#issuecomment-804598821


   I just removed the handling of non specific distribution case. Please take a 
look again. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


SparkQA removed a comment on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804454409


   **[Test build #136367 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136367/testReport)**
 for PR 31931 at commit 
[`08e47d2`](https://github.com/apache/spark/commit/08e47d2fc538838892bfabad3a1a93d85ec5228b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


SparkQA commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804594750


   **[Test build #136367 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136367/testReport)**
 for PR 31931 at commit 
[`08e47d2`](https://github.com/apache/spark/commit/08e47d2fc538838892bfabad3a1a93d85ec5228b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhangptang commented on pull request #31925: Branch 3.1

2021-03-22 Thread GitBox


zhangptang commented on pull request #31925:
URL: https://github.com/apache/spark/pull/31925#issuecomment-804569362


   ok, i have been created a jira ,here is linkurl: 
https://issues.apache.org/jira/browse/SPARK-34831
   
   Please solve it quickly,thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804560380


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40959/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804560299


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40959/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804560380


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40959/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804556988


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40958/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804549509


   **[Test build #136378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)**
 for PR 29754 at commit 
[`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804549076


   **[Test build #136376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136376/testReport)**
 for PR 31937 at commit 
[`4689597`](https://github.com/apache/spark/commit/468959747e2718f15a14a3741d7671dadece429d).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


SparkQA commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804549102


   **[Test build #136377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136377/testReport)**
 for PR 31920 at commit 
[`326d4dd`](https://github.com/apache/spark/commit/326d4dd6c221b5b893118ae232ad98e4f62b8081).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804548660


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804548658


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804548660


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804548658


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


SparkQA commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804548623


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40959/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadhen commented on a change in pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


sadhen commented on a change in pull request #31735:
URL: https://github.com/apache/spark/pull/31735#discussion_r599236008



##
File path: python/pyspark/sql/tests/test_pandas_udf_scalar.py
##
@@ -1109,6 +1109,102 @@ def f3i(it):
 
 self.assertEqual(expected, df1.collect())
 
+# SPARK-34600
+def test_user_defined_types_with_udf(self):
+"""PandasUDF returns single UDT out.
+"""
+
+# ExamplePointUDT uses ArrayType to present its sqlType.
+@pandas_udf(ExamplePointUDT())
+def create_vector(series: pd.Series) -> pd.Series:
+vectors = []
+for _, item in series.items():
+vectors.append(ExamplePoint(item, item + 1))
+return pd.Series(vectors)
+
+# ExampleBoxUDT uses StructType to present its sqlType.
+@pandas_udf(ExampleBoxUDT())
+def create_boxes(series: pd.Series) -> pd.Series:
+boxes = []
+for _, item in series.items():
+boxes.append(ExampleBox(item, item + 1, item + 2, item + 3))
+return pd.Series(boxes)
+
+df = self.spark.range(2)
+df = (
+df
+.withColumn("vector", create_vector(col("id")))
+.withColumn("box", create_boxes(col("id")))
+)
+df.show()
+self.assertEqual([
+Row(id=0, vector=ExamplePoint(0, 1), box=ExampleBox(0, 1, 2, 3)),
+Row(id=1, vector=ExamplePoint(1, 2), box=ExampleBox(1, 2, 3, 4))
+], df.collect())
+
+# SPARK-34600
+def test_user_defined_types_in_struct(self):
+@pandas_udf(StructType([
+StructField("vec", ArrayType(ExamplePointUDT())),
+StructField("box", ArrayType(ExampleBoxUDT()))
+]))
+def array_of_udt_structs(series: pd.Series) -> pd.DataFrame:
+vectors = []
+for _, i in series.items():
+vectors.append({
+"vec": [ExamplePoint(i, i), ExamplePoint(i + 1, i + 1)],
+"box": [ExampleBox(*([i] * 4)), ExampleBox(*([i+1] * 4))],
+})
+return pd.DataFrame(vectors)
+
+df = self.spark.range(1, 3)
+df = df.withColumn("nested", array_of_udt_structs(df.id))
+df.show()
+self.assertEqual([
+Row(id=1, nested=Row(
+vec=[ExamplePoint(1, 1), ExamplePoint(2, 2)],
+box=[ExampleBox(1, 1, 1, 1), ExampleBox(2, 2, 2, 2)])),
+Row(id=2, nested=Row(
+vec=[ExamplePoint(2, 2), ExamplePoint(3, 3)],
+box=[ExampleBox(2, 2, 2, 2), ExampleBox(3, 3, 3, 3)]))
+], df.collect())
+
+# SPARK-34600
+def test_user_defined_types_in_array(self):

Review comment:
   1. Some unsupported types are explicitly asserted in `to_arrow_type`. 
For these unsupported types, just add a python UDT, and catch the assertion in 
the test.
   2. Other unsupported types are complicated. Like I mentioned in 
https://github.com/apache/spark/pull/31735#issuecomment-804539589
   
   It is feasible to add tests for the first one. For the latter one, maybe we 
should reject it earlier (eg. add more explicit assertion in `to_arrow_type`).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


SparkQA commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804547784


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


SparkQA commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804546177


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40957/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadhen edited a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


sadhen edited a comment on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589


   @eddyxu I wrote a UDT with Timestamp, but failed to make it work. See the 
demo pr: https://github.com/eddyxu/spark/pull/4
   
   For ExampleBox, serialize to list works fine. But for 
ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to 
serialize it to dict. For the following snippets, the python part works fine. 
But I failed to deserialize the ExamplePointWithTime properly in the Scala part.
   
   Do we need to make UDT with Timestamp work in this PR? How about postpone it 
in another JIRA ticket?
   
   @maropu What's your opinion? I do not want to make this PR too complicated 
and hard to review.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadhen edited a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


sadhen edited a comment on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589


   @eddyxu I wrote a UDT with Timestamp, but failed to make it work. See the 
demo pr: https://github.com/eddyxu/spark/pull/4
   
   For ExampleBox, serialize to list works fine. But for 
ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to 
serialize it to dict. For the following snippets, the python part works fine. 
But I failed to deserialize the ExamplePointWithTime properly in the Scala part.
   
   ``` python
   class ExamplePointWithTimeUDT(UserDefinedType):
   """
   User-defined type (UDT) for ExamplePointWithTime.
   """
   
   @classmethod
   def sqlType(self):
   return StructType([
   StructField("x", DoubleType(), False),
   StructField("y", DoubleType(), True),
   StructField("ts", TimestampType(), False),
   ])
   
   @classmethod
   def module(cls):
   return 'pyspark.sql.tests'
   
   @classmethod
   def scalaUDT(cls):
   return 'org.apache.spark.sql.test.ExamplePointWithTimeUDT'
   
   def serialize(self, obj):
   return {'x': obj.x, 'y': obj.y, 'ts': obj.ts}
   
   def deserialize(self, datum):
   return ExamplePointWithTime(datum['x'], datum['y'], datum['ts'])
   
   
   class ExamplePointWithTime:
   """
   An example class to demonstrate UDT in Scala, Java, and Python.
   """
   
   __UDT__ = ExamplePointWithTimeUDT()
   
   def __init__(self, x, y, ts):
   self.x = x
   self.y = y
   self.ts = ts
   
   def __repr__(self):
   return "ExamplePointWithTime(%s,%s,%s)" % (self.x, self.y, self.ts)
   
   def __str__(self):
   return "(%s,%s,%s)" % (self.x, self.y, self.ts)
   
   def __eq__(self, other):
   return isinstance(other, self.__class__) \
   and other.x == self.x and other.y == self.y \
   and other.ts == self.ts
   ```
   
   ``` scala
   package org.apache.spark.sql.test
   
   import java.sql.Timestamp
   
   import org.apache.spark.sql.catalyst.InternalRow
   import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
   import org.apache.spark.sql.types.{DataType, DoubleType, SQLUserDefinedType, 
StructField, StructType, TimestampType, UserDefinedType}
   
   
   /**
* An example class to demonstrate UDT in Scala, Java, and Python.
* @param x x coordinate
* @param y y coordinate
* @param ts timestamp
*/
   @SQLUserDefinedType(udt = classOf[ExamplePointUDT])
   private[sql] class ExamplePointWithTime(val x: Double, val y: Double, val 
ts: Timestamp)
 extends Serializable {
   
 override def hashCode(): Int = {
   var hash = 13
   hash = hash * 31 + x.hashCode()
   hash = hash * 31 + y.hashCode()
   hash = hash * 31 + ts.hashCode()
   hash
 }
   
 override def equals(other: Any): Boolean = other match {
   case that: ExamplePointWithTime =>
 this.x == that.x && this.y == that.y && this.ts == that.ts
   case _ => false
 }
   
 override def toString(): String = s"($x, $y, ${ts.toString})"
   }
   
   /**
* User-defined type for [[ExamplePoint]].
*/
   private[sql] class ExamplePointWithTimeUDT extends 
UserDefinedType[ExamplePointWithTime] {
   
 override def sqlType: DataType = StructType(Array(
   StructField("x", DoubleType, nullable = false),
   StructField("y", DoubleType, nullable = true),
   StructField("ts", TimestampType, nullable = false)
 ))
   
 override def pyUDT: String = 
"pyspark.testing.sqlutils.ExamplePointWithTimeUDT"
   
 override def serialize(p: ExamplePointWithTime): ArrayBasedMapData = {
   ArrayBasedMapData(
 Array("x", "y", "ts"),
 Array(p.x, p.y, p.ts)
   )
 }
   
 override def deserialize(datum: Any): ExamplePointWithTime = {
   datum match {
 case row: InternalRow =>
   new ExamplePointWithTime(
 row.getDouble(0),
 row.getDouble(1),
 row.get(2, TimestampType)  // .asInstanceOf[Timestamp]   it is 
Long, cannot be casted to Timestamp
   )
   }
 }
   
 override def userClass: Class[ExamplePointWithTime] = 
classOf[ExamplePointWithTime]
   
 private[spark] override def asNullable: ExamplePointWithTimeUDT = this
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2021-03-22 Thread GitBox


sunchao commented on a change in pull request #24559:
URL: https://github.com/apache/spark/pull/24559#discussion_r596497034



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog.functions;
+
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * Interface for a function that produces a result value by aggregating over 
multiple input rows.
+ * 
+ * The JVM type of result values produced by this function must be the type 
used by Spark's
+ * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * 
+ * Most implementations should also implement {@link PartialAggregateFunction} 
so that Spark can
+ * partially aggregate and shuffle intermediate results, instead of shuffling 
all rows for an
+ * aggregate. This reduces the impact of data skew and the amount of data 
shuffled to produce the
+ * result.
+ *
+ * @param  the JVM type for the aggregation's intermediate state
+ * @param  the JVM type of result values
+ */
+public interface AggregateFunction extends BoundFunction {
+
+  /**
+   * Initialize state for an aggregation.
+   * 
+   * This method is called one or more times for every group of values to 
initialize intermediate
+   * aggregation state. More than one intermediate aggregation state variable 
may be used when the
+   * aggregation is run in parallel tasks.
+   * 
+   * The object returned may passed to {@link #update(Object, InternalRow)},
+   * and {@link #produceResult(Object)}. Implementations that return null must 
support null state
+   * passed into all other methods.
+   *
+   * @return a state instance or null
+   */
+  S newAggregationState();
+
+  /**
+   * Update the aggregation state with a new row.
+   * 
+   * This is called for each row in a group to update an intermediate 
aggregation state.
+   *
+   * @param state intermediate aggregation state
+   * @param input an input row
+   * @return updated aggregation state
+   */
+  S update(S state, InternalRow input);
+
+  /**
+   * Produce the aggregation result based on intermediate state.
+   *
+   * @param state intermediate aggregation state
+   * @return a result value
+   */
+  R produceResult(S state);
+

Review comment:
   One issue I found with the `Serializable` approach is that currently in 
Spark the `SerializerInstance` as well as `ExpressionEncoder` all require 
`ClassTag`, which is not available from Java. This makes it hard to reuse the 
existing machinery in Spark for the serialization/deserialization work. Another 
issue, which is reflected by the CI failure, is that simple classes such as:
   ```scala
   class IntAverage extends AggregateFunction[(Int, Int), Int]
   ```
   ~~will not work out-of-box, as `(Int, Int)` doesn't implement 
`Serializable`~~. 
   
   Edit: sorry `TupleN` does implement `Serializable` in Scala, and the issue 
is (it seems) we can't get a `AggregateFunction` from a `BoundFunction` with 
the `Serializable` constraint.
   
   The `ClassTag` constraint for `SerializerInstance` was added in #700 for 
supporting Scala Pickling as one of the serializer implementation but seems the 
PR never ended in Spark, so not quite sure if it is still needed today, 
although it would require change a public developer API. Thanks @viirya for 
having a offline discussion with me on this.
   
   Because of this, I'm wondering if it makes sense to replace the 
`Serializable` with something else, such as another method:
   ```java
   Encoder encoder();
   ```
   This can be implemented pretty easily by Spark users with 
[`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala).
 The approach is similar to the `udaf` API today. For Scala users, we can 
optionally provide another version of `AggregateFunction` in Scala with 
implicit, so users don't need to do this.
   
   Would like to hear your opinion on this @rdblue @cloud-fan 
   




-- 
This is an automated message from the Apache Git Service.
To 

[GitHub] [spark] HeartSaVioR edited a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


HeartSaVioR edited a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654


   > Is the second one the approach we took in #31570?
   
   Yes, the code is not copied from #31570 but the approach is similar. 
Actually, my old PR was having both approaches to address all cases. (Examples: 
aggregation having one distinct, pandas aggregation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR edited a comment on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


HeartSaVioR edited a comment on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654


   > Is the second one the approach we took in #31570?
   
   It's not copied from #31570 but the approach is similar. Actually, my old PR 
was having both approaches to address all cases. (Examples: aggregation having 
one distinct, pandas aggregation)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


HeartSaVioR commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654


   > Is the second one the approach we took in #31570?
   
   It's not copied from #31570 but the approach is similar. Actually, my old PR 
was having both approaches to address all cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2021-03-22 Thread GitBox


sunchao commented on a change in pull request #24559:
URL: https://github.com/apache/spark/pull/24559#discussion_r596497034



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog.functions;
+
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * Interface for a function that produces a result value by aggregating over 
multiple input rows.
+ * 
+ * The JVM type of result values produced by this function must be the type 
used by Spark's
+ * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * 
+ * Most implementations should also implement {@link PartialAggregateFunction} 
so that Spark can
+ * partially aggregate and shuffle intermediate results, instead of shuffling 
all rows for an
+ * aggregate. This reduces the impact of data skew and the amount of data 
shuffled to produce the
+ * result.
+ *
+ * @param  the JVM type for the aggregation's intermediate state
+ * @param  the JVM type of result values
+ */
+public interface AggregateFunction extends BoundFunction {
+
+  /**
+   * Initialize state for an aggregation.
+   * 
+   * This method is called one or more times for every group of values to 
initialize intermediate
+   * aggregation state. More than one intermediate aggregation state variable 
may be used when the
+   * aggregation is run in parallel tasks.
+   * 
+   * The object returned may passed to {@link #update(Object, InternalRow)},
+   * and {@link #produceResult(Object)}. Implementations that return null must 
support null state
+   * passed into all other methods.
+   *
+   * @return a state instance or null
+   */
+  S newAggregationState();
+
+  /**
+   * Update the aggregation state with a new row.
+   * 
+   * This is called for each row in a group to update an intermediate 
aggregation state.
+   *
+   * @param state intermediate aggregation state
+   * @param input an input row
+   * @return updated aggregation state
+   */
+  S update(S state, InternalRow input);
+
+  /**
+   * Produce the aggregation result based on intermediate state.
+   *
+   * @param state intermediate aggregation state
+   * @return a result value
+   */
+  R produceResult(S state);
+

Review comment:
   One issue I found with the `Serializable` approach is that currently in 
Spark the `SerializerInstance` as well as `ExpressionEncoder` all require 
`ClassTag`, which is not available from Java. This makes it hard to reuse the 
existing machinery in Spark for the serialization/deserialization work. Another 
issue, which is reflected by the CI failure, is that simple classes such as:
   ```scala
   class IntAverage extends AggregateFunction[(Int, Int), Int]
   ```
   ~~will not work out-of-box, as `(Int, Int)` doesn't implement 
`Serializable`~~. 
   
   Edit: sorry `TupleN` does implement `Serializable` in Scala, and the issue 
is (it seems) we can't get a `AggregateFunction` from a `BoundFunction` with 
the `Serializable` constraint.
   
   The `ClassTag` constraint for `SerializerInstance` was added in #700 for 
supporting Scala Pickling as one of the serializer implementation but seems the 
PR never ended in Spark, so not quite sure if it is still needed today. Thanks 
@viirya for having a offline discussion with me on this.
   
   Because of this, I'm wondering if it makes sense to replace the 
`Serializable` with something else, such as another method:
   ```java
   Encoder encoder();
   ```
   This can be implemented pretty easily by Spark users with 
[`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala).
 The approach is similar to the `udaf` API today. For Scala users, we can 
optionally provide another version of `AggregateFunction` in Scala with 
implicit, so users don't need to do this.
   
   Would like to hear your opinion on this @rdblue @cloud-fan 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the

[GitHub] [spark] viirya commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


viirya commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804541032


   > This PR leverages two different approaches on merging session windows:
   >1. merging session windows with Spark's aggregation logic (a variant of 
sort aggregation)
   >2. updating session window for all rows bound to the same session, and 
applying aggregation logic afterwards
   
   Is the second one the approach we took in #31570?

   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sadhen commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


sadhen commented on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589


   @eddyxu I wrote a UDT with Timestamp, but failed to make it work.
   
   For ExampleBox, serialize to list works fine. But for 
ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to 
serialize it to dict. For the following snippets, the python part works fine. 
But I failed to deserialize the ExamplePointWithTime properly in the Scala part.
   
   ``` python
   class ExamplePointWithTimeUDT(UserDefinedType):
   """
   User-defined type (UDT) for ExamplePointWithTime.
   """
   
   @classmethod
   def sqlType(self):
   return StructType([
   StructField("x", DoubleType(), False),
   StructField("y", DoubleType(), True),
   StructField("ts", TimestampType(), False),
   ])
   
   @classmethod
   def module(cls):
   return 'pyspark.sql.tests'
   
   @classmethod
   def scalaUDT(cls):
   return 'org.apache.spark.sql.test.ExamplePointWithTimeUDT'
   
   def serialize(self, obj):
   return {'x': obj.x, 'y': obj.y, 'ts': obj.ts}
   
   def deserialize(self, datum):
   return ExamplePointWithTime(datum['x'], datum['y'], datum['ts'])
   
   
   class ExamplePointWithTime:
   """
   An example class to demonstrate UDT in Scala, Java, and Python.
   """
   
   __UDT__ = ExamplePointWithTimeUDT()
   
   def __init__(self, x, y, ts):
   self.x = x
   self.y = y
   self.ts = ts
   
   def __repr__(self):
   return "ExamplePointWithTime(%s,%s,%s)" % (self.x, self.y, self.ts)
   
   def __str__(self):
   return "(%s,%s,%s)" % (self.x, self.y, self.ts)
   
   def __eq__(self, other):
   return isinstance(other, self.__class__) \
   and other.x == self.x and other.y == self.y \
   and other.ts == self.ts
   ```
   
   ``` scala
   package org.apache.spark.sql.test
   
   import java.sql.Timestamp
   
   import org.apache.spark.sql.catalyst.InternalRow
   import org.apache.spark.sql.catalyst.util.ArrayBasedMapData
   import org.apache.spark.sql.types.{DataType, DoubleType, SQLUserDefinedType, 
StructField, StructType, TimestampType, UserDefinedType}
   
   
   /**
* An example class to demonstrate UDT in Scala, Java, and Python.
* @param x x coordinate
* @param y y coordinate
* @param ts timestamp
*/
   @SQLUserDefinedType(udt = classOf[ExamplePointUDT])
   private[sql] class ExamplePointWithTime(val x: Double, val y: Double, val 
ts: Timestamp)
 extends Serializable {
   
 override def hashCode(): Int = {
   var hash = 13
   hash = hash * 31 + x.hashCode()
   hash = hash * 31 + y.hashCode()
   hash = hash * 31 + ts.hashCode()
   hash
 }
   
 override def equals(other: Any): Boolean = other match {
   case that: ExamplePointWithTime =>
 this.x == that.x && this.y == that.y && this.ts == that.ts
   case _ => false
 }
   
 override def toString(): String = s"($x, $y, ${ts.toString})"
   }
   
   /**
* User-defined type for [[ExamplePoint]].
*/
   private[sql] class ExamplePointWithTimeUDT extends 
UserDefinedType[ExamplePointWithTime] {
   
 override def sqlType: DataType = StructType(Array(
   StructField("x", DoubleType, nullable = false),
   StructField("y", DoubleType, nullable = true),
   StructField("ts", TimestampType, nullable = false)
 ))
   
 override def pyUDT: String = 
"pyspark.testing.sqlutils.ExamplePointWithTimeUDT"
   
 override def serialize(p: ExamplePointWithTime): ArrayBasedMapData = {
   ArrayBasedMapData(
 Array("x", "y", "ts"),
 Array(p.x, p.y, p.ts)
   )
 }
   
 override def deserialize(datum: Any): ExamplePointWithTime = {
   datum match {
 case row: InternalRow =>
   new ExamplePointWithTime(
 row.getDouble(0),
 row.getDouble(1),
 row.get(2, TimestampType)  // .asInstanceOf[Timestamp]   it is 
Long, cannot be casted to Timestamp
   )
   }
 }
   
 override def userClass: Class[ExamplePointWithTime] = 
classOf[ExamplePointWithTime]
   
 private[spark] override def asNullable: ExamplePointWithTimeUDT = this
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API

2021-03-22 Thread GitBox


sunchao commented on a change in pull request #24559:
URL: https://github.com/apache/spark/pull/24559#discussion_r596497034



##
File path: 
sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.connector.catalog.functions;
+
+import org.apache.spark.sql.catalyst.InternalRow;
+import org.apache.spark.sql.types.DataType;
+
+/**
+ * Interface for a function that produces a result value by aggregating over 
multiple input rows.
+ * 
+ * The JVM type of result values produced by this function must be the type 
used by Spark's
+ * InternalRow API for the {@link DataType SQL data type} returned by {@link 
#resultType()}.
+ * 
+ * Most implementations should also implement {@link PartialAggregateFunction} 
so that Spark can
+ * partially aggregate and shuffle intermediate results, instead of shuffling 
all rows for an
+ * aggregate. This reduces the impact of data skew and the amount of data 
shuffled to produce the
+ * result.
+ *
+ * @param  the JVM type for the aggregation's intermediate state
+ * @param  the JVM type of result values
+ */
+public interface AggregateFunction extends BoundFunction {
+
+  /**
+   * Initialize state for an aggregation.
+   * 
+   * This method is called one or more times for every group of values to 
initialize intermediate
+   * aggregation state. More than one intermediate aggregation state variable 
may be used when the
+   * aggregation is run in parallel tasks.
+   * 
+   * The object returned may passed to {@link #update(Object, InternalRow)},
+   * and {@link #produceResult(Object)}. Implementations that return null must 
support null state
+   * passed into all other methods.
+   *
+   * @return a state instance or null
+   */
+  S newAggregationState();
+
+  /**
+   * Update the aggregation state with a new row.
+   * 
+   * This is called for each row in a group to update an intermediate 
aggregation state.
+   *
+   * @param state intermediate aggregation state
+   * @param input an input row
+   * @return updated aggregation state
+   */
+  S update(S state, InternalRow input);
+
+  /**
+   * Produce the aggregation result based on intermediate state.
+   *
+   * @param state intermediate aggregation state
+   * @return a result value
+   */
+  R produceResult(S state);
+

Review comment:
   One issue I found with the `Serializable` approach is that currently in 
Spark the `SerializerInstance` as well as `ExpressionEncoder` all require 
`ClassTag`, which is not available from Java. This makes it hard to reuse the 
existing machinery in Spark for the serialization/deserialization work. Another 
issue, which is reflected by the CI failure, is that simple classes such as:
   ```scala
   class IntAverage extends AggregateFunction[(Int, Int), Int]
   ```
   ~~will not work out-of-box, as `(Int, Int)` doesn't implement 
`Serializable`~~. Edit: sorry NVM on this one - `TupleN` does implement 
`Serializable` and the test failure is due to something else.
   
   The `ClassTag` constraint for `SerializerInstance` was added in #700 for 
supporting Scala Pickling as one of the serializer implementation but seems the 
PR never ended in Spark, so not quite sure if it is still needed today. Thanks 
@viirya for having a offline discussion with me on this.
   
   Because of this, I'm wondering if it makes sense to replace the 
`Serializable` with something else, such as another method:
   ```java
   Encoder encoder();
   ```
   This can be implemented pretty easily by Spark users with 
[`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala).
 The approach is similar to the `udaf` API today. For Scala users, we can 
optionally provide another version of `AggregateFunction` in Scala with 
implicit, so users don't need to do this.
   
   Would like to hear your opinion on this @rdblue @cloud-fan 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, 

[GitHub] [spark] beliefer commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution

2021-03-22 Thread GitBox


beliefer commented on pull request #31920:
URL: https://github.com/apache/spark/pull/31920#issuecomment-804537772


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively

2021-03-22 Thread GitBox


SparkQA commented on pull request #31937:
URL: https://github.com/apache/spark/pull/31937#issuecomment-804535263


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40955/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.

2021-03-22 Thread GitBox


beliefer commented on pull request #29754:
URL: https://github.com/apache/spark/pull/29754#issuecomment-804533704


   > @beliefer , could you resolve the conflicts?
   @dongido001 Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics

2021-03-22 Thread GitBox


SparkQA commented on pull request #31476:
URL: https://github.com/apache/spark/pull/31476#issuecomment-804533230


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40954/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31611: [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specifi

2021-03-22 Thread GitBox


AngersZh commented on a change in pull request #31611:
URL: https://github.com/apache/spark/pull/31611#discussion_r599197632



##
File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala
##
@@ -113,10 +113,15 @@ private[spark] class AppStatusStore(
 }
   }
 
-  def stageData(stageId: Int, details: Boolean = false): Seq[v1.StageData] = {
+  def stageData(
+stageId: Int,
+details: Boolean = false,
+withSummaries: Boolean = false,

Review comment:
   > OK I see it now, withSummaries causes more info to be returned
   
   Yea, return more summaries metrics in distribution. It's very useful.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


AmplabJenkins removed a comment on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804529977


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


AmplabJenkins commented on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804529977


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF

2021-03-22 Thread GitBox


SparkQA commented on pull request #31735:
URL: https://github.com/apache/spark/pull/31735#issuecomment-804529959


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40956/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup

2021-03-22 Thread GitBox


SparkQA commented on pull request #31102:
URL: https://github.com/apache/spark/pull/31102#issuecomment-804529725


   **[Test build #136375 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)**
 for PR 31102 at commit 
[`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)

2021-03-22 Thread GitBox


SparkQA commented on pull request #31931:
URL: https://github.com/apache/spark/pull/31931#issuecomment-804529467


   **[Test build #136374 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136374/testReport)**
 for PR 31931 at commit 
[`8ee7536`](https://github.com/apache/spark/commit/8ee75369cb66abf85ecf6f7bde98cbdd3f1287b9).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   >