[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write
SparkQA commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-804640082 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40964/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs
MaxGekk commented on a change in pull request #31938: URL: https://github.com/apache/spark/pull/31938#discussion_r599288224 ## File path: docs/cloud-integration.md ## @@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1. Here is the documentation on the standard connectors both from Apache and the cloud providers. * [OpenStack Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html). -* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html). +* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html). Review comment: I would prefer third one - both links mentioned separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
SparkQA commented on pull request #31933: URL: https://github.com/apache/spark/pull/31933#issuecomment-804637310 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40963/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general metho
SparkQA removed a comment on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804549509 **[Test build #136378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)** for PR 29754 at commit [`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804636621 **[Test build #136378 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)** for PR 29754 at commit [`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lenadroid commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs
lenadroid commented on a change in pull request #31938: URL: https://github.com/apache/spark/pull/31938#discussion_r599280673 ## File path: docs/cloud-integration.md ## @@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1. Here is the documentation on the standard connectors both from Apache and the cloud providers. * [OpenStack Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html). -* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html). +* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html). Review comment: Thanks for the feedback! Yes, the link I provided is for ABFS & Data Lake Gen 2 specifically. Which option would you prefer: 1. Change the link text to say "Azure Blob Storage" and point to https://hadoop.apache.org/docs/current/hadoop-azure/index.html 2. Change the link text to say "Azure Blob Filesystem and Azure Data Lake Gen 2" and point to https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html 3. Have both links mentioned above available. Let me know which one you prefer and I'll make a change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804625250 **[Test build #136383 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136383/testReport)** for PR 31102 at commit [`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs
MaxGekk commented on a change in pull request #31938: URL: https://github.com/apache/spark/pull/31938#discussion_r599277581 ## File path: docs/cloud-integration.md ## @@ -276,7 +276,7 @@ under-reported with Hadoop versions before 3.3.1. Here is the documentation on the standard connectors both from Apache and the cloud providers. * [OpenStack Swift](https://hadoop.apache.org/docs/current/hadoop-openstack/index.html). -* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html). +* [Azure Blob Storage and Azure Datalake Gen 2](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html). Review comment: The link is specifically for ABFS actually. Should we provide the link for `Azure Blob Storage`: https://hadoop.apache.org/docs/current/hadoop-azure/index.html ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
Ngone51 commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804624719 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31938: [MINOR][DOCS] Updating the link for Azure Data Lake Gen 2 in docs
HyukjinKwon commented on pull request #31938: URL: https://github.com/apache/spark/pull/31938#issuecomment-804624501 @steveloughran fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins removed a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804621698 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136371/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31938: Updating the link for Azure Data Lake Gen 2 in docs
AmplabJenkins commented on pull request #31938: URL: https://github.com/apache/spark/pull/31938#issuecomment-804622828 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] lenadroid opened a new pull request #31938: Updating the link for Azure Data Lake Gen 2 in docs
lenadroid opened a new pull request #31938: URL: https://github.com/apache/spark/pull/31938 Current link for `Azure Blob Storage and Azure Datalake Gen 2` leads to AWS information. Replacing the link to point to the right page. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804621698 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136371/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on server side
dongjoon-hyun commented on pull request #31936: URL: https://github.com/apache/spark/pull/31936#issuecomment-804621313 Thank you for pining me, @xkrogen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
AmplabJenkins removed a comment on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804620821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136375/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
AmplabJenkins removed a comment on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31919: [SPARK-34087][FOLLOW-UP][SQL] Manage ExecutionListenerBus register inside itself
SparkQA commented on pull request #31919: URL: https://github.com/apache/spark/pull/31919#issuecomment-804621221 **[Test build #136382 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136382/testReport)** for PR 31919 at commit [`ae5d5d6`](https://github.com/apache/spark/commit/ae5d5d669d290de486a4ba473505a753263fb993). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804621166 **[Test build #136381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136381/testReport)** for PR 31937 at commit [`70bf13e`](https://github.com/apache/spark/commit/70bf13e89c0bcdcede7f6004d34062800480ea9f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA removed a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804508390 **[Test build #136371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136371/testReport)** for PR 31937 at commit [`724557a`](https://github.com/apache/spark/commit/724557a43e40cf4e0c1c4456a79164a9ac24b6eb). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general
AmplabJenkins removed a comment on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804620822 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804621041 **[Test build #136371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136371/testReport)** for PR 31937 at commit [`724557a`](https://github.com/apache/spark/commit/724557a43e40cf4e0c1c4456a79164a9ac24b6eb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
AmplabJenkins commented on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804620823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
AmplabJenkins commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804620826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
AmplabJenkins commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804620821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136375/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on
dongjoon-hyun commented on a change in pull request #31936: URL: https://github.com/apache/spark/pull/31936#discussion_r599273586 ## File path: docs/running-on-yarn.md ## @@ -761,8 +761,27 @@ The following extra configuration options are available when the shuffle service NodeManagers where the Spark Shuffle Service is not running. + + spark.yarn.shuffle.service.metrics.namespace + sparkShuffleService + +The namespace to use when emitting shuffle service metrics into Hadoop metrics2 system of the +NodeManager. Review comment: Could you add some description about the limitation with old Hadoop versions (like 2.7.x)? Here or at Section `Running multiple versions of the Spark Shuffle Service`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write
SparkQA commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-804620342 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40964/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on
dongjoon-hyun commented on a change in pull request #31936: URL: https://github.com/apache/spark/pull/31936#discussion_r599272016 ## File path: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnShuffleIntegrationSuite.scala ## @@ -109,6 +110,59 @@ class YarnShuffleAuthSuite extends YarnShuffleIntegrationSuite { } +/** + * SPARK-34828: Integration test for the external shuffle service with an alternate name and + * configs (by using a configuration overlay) + */ +@ExtendedYarnTest +class YarnShuffleAlternateNameConfigSuite extends YarnShuffleIntegrationSuite { Review comment: Please make this new test suite as a separate file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31936: [SPARK-34828][YARN] Make shuffle service name configurable on client side and allow for classpath-based config override on
dongjoon-hyun commented on a change in pull request #31936: URL: https://github.com/apache/spark/pull/31936#discussion_r599271678 ## File path: docs/running-on-yarn.md ## @@ -811,3 +830,52 @@ do the following: to the list of filters in the spark.ui.filters configuration. Be aware that the history server information may not be up-to-date with the application's state. + +# Running multiple versions of the Spark Shuffle Service + +In some cases it may be desirable to run multiple instances of the Spark Shuffle Service which are +using different versions of Spark. This can be helpful, for example, when running a YARN cluster +with a mixed workload of applications running multiple Spark versions, since a given version of +the shuffle service is not always compatible with other versions of Spark. YARN versions since 2.9.0 +support the ability to run shuffle services within an isolated classloader +(see [YARN-4577](https://issues.apache.org/jira/browse/YARN-4577)), meaning multiple Spark versions +can coexist within a single NodeManager. The +`yarn.nodemanager.aux-services..classpath` and, starting from YARN 2.10.2/3.1.1/3.2.0, +`yarn.nodemanager.aux-services..remote-classpath` options can be used to configure +this. In addition to setting up separate classpaths, it's necessary to ensure the two versions +advertise to different ports. This can be achieved using the `spark-shuffle-site.xml` file described +above. For example, you may have configuration like: + +```properties + yarn.nodemanager.aux-services = spark_shuffle_x,spark_shuffle_y + yarn.nodemanager.aux-services.spark_shuffle_x.classpath = /path/to/spark-x-yarn-shuffle.jar,/path/to/spark-x-config + yarn.nodemanager.aux-services.spark_shuffle_y.classpath = /path/to/spark-y-yarn-shuffle.jar,/path/to/spark-y-config +``` + +The two `spark-*-config` directories each contain one file, `spark-shuffle-site.xml`. These are XML +files in the [Hadoop Configuration format](https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/conf/Configuration.html) Review comment: Shall we reference Apache Hadoop 3.2.2 doc instead of 3.2.0 because we are using Apache Spark 3.2.2? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
SparkQA commented on pull request #31933: URL: https://github.com/apache/spark/pull/31933#issuecomment-804617321 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40963/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804616758 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40962/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general metho
SparkQA removed a comment on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804528702 **[Test build #136373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136373/testReport)** for PR 29754 at commit [`15c63c3`](https://github.com/apache/spark/commit/15c63c3b7964982a468b50fad5768bf0ea612fe2). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804616427 **[Test build #136373 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136373/testReport)** for PR 29754 at commit [`15c63c3`](https://github.com/apache/spark/commit/15c63c3b7964982a468b50fad5768bf0ea612fe2). * This patch **fails SparkR unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
SparkQA removed a comment on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804488767 **[Test build #136370 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)** for PR 31476 at commit [`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
SparkQA commented on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804616183 **[Test build #136370 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136370/testReport)** for PR 31476 at commit [`f46b733`](https://github.com/apache/spark/commit/f46b733c2ec276dad31aa7732ff2349fd4363e52). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
cloud-fan commented on a change in pull request #31933: URL: https://github.com/apache/spark/pull/31933#discussion_r599269087 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -62,15 +62,17 @@ case class CreateViewCommand( comment: Option[String], properties: Map[String, String], originalText: Option[String], -child: LogicalPlan, +analyzedPlan: LogicalPlan, Review comment: how do we analyze it since it's not a child? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
cloud-fan commented on a change in pull request #31933: URL: https://github.com/apache/spark/pull/31933#discussion_r599268594 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CacheTableExec.scala ## @@ -94,14 +94,19 @@ case class CacheTableAsSelectExec( override lazy val relationName: String = tempViewName override lazy val planToCache: LogicalPlan = { +// If the plan cannot be analyzed, throw an exception and don't proceed. +val qe = sparkSession.sessionState.executePlan(query) +qe.assertAnalyzed() +val analyzedPlan = qe.analyzed Review comment: The current code looks fine. I think `CacheTableAsSelectExec` is the only exception that it has a `query` which is not a simple table relation but we want to skip optimizing it. Let's document this clearly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
SparkQA removed a comment on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804485460 **[Test build #136369 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)** for PR 31476 at commit [`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
SparkQA commented on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804614005 **[Test build #136369 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136369/testReport)** for PR 31476 at commit [`a35c056`](https://github.com/apache/spark/commit/a35c05684f6c1d27257dc0c9e8b2a6d24666eb16). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
cloud-fan commented on a change in pull request #31933: URL: https://github.com/apache/spark/pull/31933#discussion_r599266970 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -28,6 +28,7 @@ trait Command extends LogicalPlan { override def output: Seq[Attribute] = Seq.empty override def producedAttributes: AttributeSet = outputSet override def children: Seq[LogicalPlan] = Seq.empty + def plansToCheckAnalysis: Seq[LogicalPlan] = Seq.empty Review comment: can we reuse `innerChildren`? ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala ## @@ -28,6 +28,7 @@ trait Command extends LogicalPlan { override def output: Seq[Attribute] = Seq.empty override def producedAttributes: AttributeSet = outputSet override def children: Seq[LogicalPlan] = Seq.empty + def plansToCheckAnalysis: Seq[LogicalPlan] = Seq.empty Review comment: can we reuse `innerChildren` instead of adding this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA removed a comment on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804529725 **[Test build #136375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)** for PR 31102 at commit [`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804611618 **[Test build #136375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)** for PR 31102 at commit [`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804606349 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40962/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #31900: [MINOR][DOCS] Update sql-ref-syntax-dml-insert-into.md
HyukjinKwon commented on pull request #31900: URL: https://github.com/apache/spark/pull/31900#issuecomment-804605634 It would be great if we can fix PR description though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
imback82 commented on a change in pull request #31933: URL: https://github.com/apache/spark/pull/31933#discussion_r599257472 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala ## @@ -167,6 +167,9 @@ trait CheckAnalysis extends PredicateHelper with LookupCatalog { case _: ShowTableExtended => throw new AnalysisException("SHOW TABLE EXTENDED is not supported for v2 tables.") + case c: Command => +c.plansToCheckAnalysis.foreach(checkAnalysis) Review comment: This seems hacky? But we cannot make the analyzed plan as `children` for `CreateViewCommaned`. The reason is that the `View` will be optimized away (in the optimizer), and the verification that checks if a permanent view references temp views will fail. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins removed a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804602571 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40960/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write
SparkQA commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-804602562 **[Test build #136380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136380/testReport)** for PR 31355 at commit [`7f6e82d`](https://github.com/apache/spark/commit/7f6e82de5a63750c4c5f210f4000ae1423007d3e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804602571 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40960/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804602548 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40960/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
SparkQA commented on pull request #31933: URL: https://github.com/apache/spark/pull/31933#issuecomment-804602370 **[Test build #136379 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136379/testReport)** for PR 31933 at commit [`6fdd9e0`](https://github.com/apache/spark/commit/6fdd9e0edc924679420d3c64472b72e83bb6006f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #31933: [SPARK-34701][SQL] Remove analyzing temp view again in CreateViewCommand
imback82 commented on a change in pull request #31933: URL: https://github.com/apache/spark/pull/31933#discussion_r599256804 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ## @@ -62,15 +62,17 @@ case class CreateViewCommand( comment: Option[String], properties: Map[String, String], originalText: Option[String], -child: LogicalPlan, +analyzedPlan: LogicalPlan, allowExisting: Boolean, replace: Boolean, viewType: ViewType) extends RunnableCommand { import ViewHelper._ - override def innerChildren: Seq[QueryPlan[_]] = Seq(child) + override def plansToCheckAnalysis: Seq[LogicalPlan] = Seq(analyzedPlan) Review comment: We need to run checkAnalysis on the analyzed plan, otherwise, for the following: ``` sql("CREATE TABLE view_base_table (key int, data varchar(20)) USING PARQUET") sql("CREATE VIEW key_dependent_view AS SELECT * FROM view_base_table GROUP BY key") ``` , view creation works fine, whereas it should have failed with: ``` org.apache.spark.sql.AnalysisException expression 'spark_catalog.default.view_base_table.data' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
AmplabJenkins removed a comment on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804602129 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40961/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
AmplabJenkins commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804602129 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40961/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
SparkQA commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804602114 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40961/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
AmplabJenkins removed a comment on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804601486 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40958/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
AmplabJenkins removed a comment on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804601485 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136367/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
AmplabJenkins commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804601486 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40958/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
AmplabJenkins commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804601485 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136367/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
SparkQA commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804600544 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40961/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804600499 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40958/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804599872 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40960/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #31355: [SPARK-34255][SQL] Support partitioning with static number on required distribution and ordering on V2 write
HeartSaVioR commented on pull request #31355: URL: https://github.com/apache/spark/pull/31355#issuecomment-804598821 I just removed the handling of non specific distribution case. Please take a look again. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
SparkQA removed a comment on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804454409 **[Test build #136367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136367/testReport)** for PR 31931 at commit [`08e47d2`](https://github.com/apache/spark/commit/08e47d2fc538838892bfabad3a1a93d85ec5228b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
SparkQA commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804594750 **[Test build #136367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136367/testReport)** for PR 31931 at commit [`08e47d2`](https://github.com/apache/spark/commit/08e47d2fc538838892bfabad3a1a93d85ec5228b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhangptang commented on pull request #31925: Branch 3.1
zhangptang commented on pull request #31925: URL: https://github.com/apache/spark/pull/31925#issuecomment-804569362 ok, i have been created a jira ,here is linkurl: https://issues.apache.org/jira/browse/SPARK-34831 Please solve it quickly,thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general
AmplabJenkins removed a comment on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804560380 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40959/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804560299 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40959/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
AmplabJenkins commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804560380 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40959/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804556988 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40958/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804549509 **[Test build #136378 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136378/testReport)** for PR 29754 at commit [`f5229e6`](https://github.com/apache/spark/commit/f5229e622ce9f729050068fc65ecf55caff37978). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804549076 **[Test build #136376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136376/testReport)** for PR 31937 at commit [`4689597`](https://github.com/apache/spark/commit/468959747e2718f15a14a3741d7671dadece429d). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
SparkQA commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804549102 **[Test build #136377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136377/testReport)** for PR 31920 at commit [`326d4dd`](https://github.com/apache/spark/commit/326d4dd6c221b5b893118ae232ad98e4f62b8081). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins removed a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804548660 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40955/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
AmplabJenkins removed a comment on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804548658 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40957/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31937: [SPARK-10816][SS] Support session window natively
AmplabJenkins commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804548660 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40955/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
AmplabJenkins commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804548658 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40957/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
SparkQA commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804548623 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40959/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadhen commented on a change in pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
sadhen commented on a change in pull request #31735: URL: https://github.com/apache/spark/pull/31735#discussion_r599236008 ## File path: python/pyspark/sql/tests/test_pandas_udf_scalar.py ## @@ -1109,6 +1109,102 @@ def f3i(it): self.assertEqual(expected, df1.collect()) +# SPARK-34600 +def test_user_defined_types_with_udf(self): +"""PandasUDF returns single UDT out. +""" + +# ExamplePointUDT uses ArrayType to present its sqlType. +@pandas_udf(ExamplePointUDT()) +def create_vector(series: pd.Series) -> pd.Series: +vectors = [] +for _, item in series.items(): +vectors.append(ExamplePoint(item, item + 1)) +return pd.Series(vectors) + +# ExampleBoxUDT uses StructType to present its sqlType. +@pandas_udf(ExampleBoxUDT()) +def create_boxes(series: pd.Series) -> pd.Series: +boxes = [] +for _, item in series.items(): +boxes.append(ExampleBox(item, item + 1, item + 2, item + 3)) +return pd.Series(boxes) + +df = self.spark.range(2) +df = ( +df +.withColumn("vector", create_vector(col("id"))) +.withColumn("box", create_boxes(col("id"))) +) +df.show() +self.assertEqual([ +Row(id=0, vector=ExamplePoint(0, 1), box=ExampleBox(0, 1, 2, 3)), +Row(id=1, vector=ExamplePoint(1, 2), box=ExampleBox(1, 2, 3, 4)) +], df.collect()) + +# SPARK-34600 +def test_user_defined_types_in_struct(self): +@pandas_udf(StructType([ +StructField("vec", ArrayType(ExamplePointUDT())), +StructField("box", ArrayType(ExampleBoxUDT())) +])) +def array_of_udt_structs(series: pd.Series) -> pd.DataFrame: +vectors = [] +for _, i in series.items(): +vectors.append({ +"vec": [ExamplePoint(i, i), ExamplePoint(i + 1, i + 1)], +"box": [ExampleBox(*([i] * 4)), ExampleBox(*([i+1] * 4))], +}) +return pd.DataFrame(vectors) + +df = self.spark.range(1, 3) +df = df.withColumn("nested", array_of_udt_structs(df.id)) +df.show() +self.assertEqual([ +Row(id=1, nested=Row( +vec=[ExamplePoint(1, 1), ExamplePoint(2, 2)], +box=[ExampleBox(1, 1, 1, 1), ExampleBox(2, 2, 2, 2)])), +Row(id=2, nested=Row( +vec=[ExamplePoint(2, 2), ExamplePoint(3, 3)], +box=[ExampleBox(2, 2, 2, 2), ExampleBox(3, 3, 3, 3)])) +], df.collect()) + +# SPARK-34600 +def test_user_defined_types_in_array(self): Review comment: 1. Some unsupported types are explicitly asserted in `to_arrow_type`. For these unsupported types, just add a python UDT, and catch the assertion in the test. 2. Other unsupported types are complicated. Like I mentioned in https://github.com/apache/spark/pull/31735#issuecomment-804539589 It is feasible to add tests for the first one. For the latter one, maybe we should reject it earlier (eg. add more explicit assertion in `to_arrow_type`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
SparkQA commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804547784 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40957/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
SparkQA commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804546177 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40957/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadhen edited a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
sadhen edited a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589 @eddyxu I wrote a UDT with Timestamp, but failed to make it work. See the demo pr: https://github.com/eddyxu/spark/pull/4 For ExampleBox, serialize to list works fine. But for ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to serialize it to dict. For the following snippets, the python part works fine. But I failed to deserialize the ExamplePointWithTime properly in the Scala part. Do we need to make UDT with Timestamp work in this PR? How about postpone it in another JIRA ticket? @maropu What's your opinion? I do not want to make this PR too complicated and hard to review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadhen edited a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
sadhen edited a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589 @eddyxu I wrote a UDT with Timestamp, but failed to make it work. See the demo pr: https://github.com/eddyxu/spark/pull/4 For ExampleBox, serialize to list works fine. But for ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to serialize it to dict. For the following snippets, the python part works fine. But I failed to deserialize the ExamplePointWithTime properly in the Scala part. ``` python class ExamplePointWithTimeUDT(UserDefinedType): """ User-defined type (UDT) for ExamplePointWithTime. """ @classmethod def sqlType(self): return StructType([ StructField("x", DoubleType(), False), StructField("y", DoubleType(), True), StructField("ts", TimestampType(), False), ]) @classmethod def module(cls): return 'pyspark.sql.tests' @classmethod def scalaUDT(cls): return 'org.apache.spark.sql.test.ExamplePointWithTimeUDT' def serialize(self, obj): return {'x': obj.x, 'y': obj.y, 'ts': obj.ts} def deserialize(self, datum): return ExamplePointWithTime(datum['x'], datum['y'], datum['ts']) class ExamplePointWithTime: """ An example class to demonstrate UDT in Scala, Java, and Python. """ __UDT__ = ExamplePointWithTimeUDT() def __init__(self, x, y, ts): self.x = x self.y = y self.ts = ts def __repr__(self): return "ExamplePointWithTime(%s,%s,%s)" % (self.x, self.y, self.ts) def __str__(self): return "(%s,%s,%s)" % (self.x, self.y, self.ts) def __eq__(self, other): return isinstance(other, self.__class__) \ and other.x == self.x and other.y == self.y \ and other.ts == self.ts ``` ``` scala package org.apache.spark.sql.test import java.sql.Timestamp import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.util.ArrayBasedMapData import org.apache.spark.sql.types.{DataType, DoubleType, SQLUserDefinedType, StructField, StructType, TimestampType, UserDefinedType} /** * An example class to demonstrate UDT in Scala, Java, and Python. * @param x x coordinate * @param y y coordinate * @param ts timestamp */ @SQLUserDefinedType(udt = classOf[ExamplePointUDT]) private[sql] class ExamplePointWithTime(val x: Double, val y: Double, val ts: Timestamp) extends Serializable { override def hashCode(): Int = { var hash = 13 hash = hash * 31 + x.hashCode() hash = hash * 31 + y.hashCode() hash = hash * 31 + ts.hashCode() hash } override def equals(other: Any): Boolean = other match { case that: ExamplePointWithTime => this.x == that.x && this.y == that.y && this.ts == that.ts case _ => false } override def toString(): String = s"($x, $y, ${ts.toString})" } /** * User-defined type for [[ExamplePoint]]. */ private[sql] class ExamplePointWithTimeUDT extends UserDefinedType[ExamplePointWithTime] { override def sqlType: DataType = StructType(Array( StructField("x", DoubleType, nullable = false), StructField("y", DoubleType, nullable = true), StructField("ts", TimestampType, nullable = false) )) override def pyUDT: String = "pyspark.testing.sqlutils.ExamplePointWithTimeUDT" override def serialize(p: ExamplePointWithTime): ArrayBasedMapData = { ArrayBasedMapData( Array("x", "y", "ts"), Array(p.x, p.y, p.ts) ) } override def deserialize(datum: Any): ExamplePointWithTime = { datum match { case row: InternalRow => new ExamplePointWithTime( row.getDouble(0), row.getDouble(1), row.get(2, TimestampType) // .asInstanceOf[Timestamp] it is Long, cannot be casted to Timestamp ) } } override def userClass: Class[ExamplePointWithTime] = classOf[ExamplePointWithTime] private[spark] override def asNullable: ExamplePointWithTimeUDT = this } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
sunchao commented on a change in pull request #24559: URL: https://github.com/apache/spark/pull/24559#discussion_r596497034 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.functions; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.DataType; + +/** + * Interface for a function that produces a result value by aggregating over multiple input rows. + * + * The JVM type of result values produced by this function must be the type used by Spark's + * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * + * Most implementations should also implement {@link PartialAggregateFunction} so that Spark can + * partially aggregate and shuffle intermediate results, instead of shuffling all rows for an + * aggregate. This reduces the impact of data skew and the amount of data shuffled to produce the + * result. + * + * @param the JVM type for the aggregation's intermediate state + * @param the JVM type of result values + */ +public interface AggregateFunction extends BoundFunction { + + /** + * Initialize state for an aggregation. + * + * This method is called one or more times for every group of values to initialize intermediate + * aggregation state. More than one intermediate aggregation state variable may be used when the + * aggregation is run in parallel tasks. + * + * The object returned may passed to {@link #update(Object, InternalRow)}, + * and {@link #produceResult(Object)}. Implementations that return null must support null state + * passed into all other methods. + * + * @return a state instance or null + */ + S newAggregationState(); + + /** + * Update the aggregation state with a new row. + * + * This is called for each row in a group to update an intermediate aggregation state. + * + * @param state intermediate aggregation state + * @param input an input row + * @return updated aggregation state + */ + S update(S state, InternalRow input); + + /** + * Produce the aggregation result based on intermediate state. + * + * @param state intermediate aggregation state + * @return a result value + */ + R produceResult(S state); + Review comment: One issue I found with the `Serializable` approach is that currently in Spark the `SerializerInstance` as well as `ExpressionEncoder` all require `ClassTag`, which is not available from Java. This makes it hard to reuse the existing machinery in Spark for the serialization/deserialization work. Another issue, which is reflected by the CI failure, is that simple classes such as: ```scala class IntAverage extends AggregateFunction[(Int, Int), Int] ``` ~~will not work out-of-box, as `(Int, Int)` doesn't implement `Serializable`~~. Edit: sorry `TupleN` does implement `Serializable` in Scala, and the issue is (it seems) we can't get a `AggregateFunction` from a `BoundFunction` with the `Serializable` constraint. The `ClassTag` constraint for `SerializerInstance` was added in #700 for supporting Scala Pickling as one of the serializer implementation but seems the PR never ended in Spark, so not quite sure if it is still needed today, although it would require change a public developer API. Thanks @viirya for having a offline discussion with me on this. Because of this, I'm wondering if it makes sense to replace the `Serializable` with something else, such as another method: ```java Encoder encoder(); ``` This can be implemented pretty easily by Spark users with [`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala). The approach is similar to the `udaf` API today. For Scala users, we can optionally provide another version of `AggregateFunction` in Scala with implicit, so users don't need to do this. Would like to hear your opinion on this @rdblue @cloud-fan -- This is an automated message from the Apache Git Service. To
[GitHub] [spark] HeartSaVioR edited a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
HeartSaVioR edited a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654 > Is the second one the approach we took in #31570? Yes, the code is not copied from #31570 but the approach is similar. Actually, my old PR was having both approaches to address all cases. (Examples: aggregation having one distinct, pandas aggregation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR edited a comment on pull request #31937: [SPARK-10816][SS] Support session window natively
HeartSaVioR edited a comment on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654 > Is the second one the approach we took in #31570? It's not copied from #31570 but the approach is similar. Actually, my old PR was having both approaches to address all cases. (Examples: aggregation having one distinct, pandas aggregation) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #31937: [SPARK-10816][SS] Support session window natively
HeartSaVioR commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804541654 > Is the second one the approach we took in #31570? It's not copied from #31570 but the approach is similar. Actually, my old PR was having both approaches to address all cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
sunchao commented on a change in pull request #24559: URL: https://github.com/apache/spark/pull/24559#discussion_r596497034 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.functions; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.DataType; + +/** + * Interface for a function that produces a result value by aggregating over multiple input rows. + * + * The JVM type of result values produced by this function must be the type used by Spark's + * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * + * Most implementations should also implement {@link PartialAggregateFunction} so that Spark can + * partially aggregate and shuffle intermediate results, instead of shuffling all rows for an + * aggregate. This reduces the impact of data skew and the amount of data shuffled to produce the + * result. + * + * @param the JVM type for the aggregation's intermediate state + * @param the JVM type of result values + */ +public interface AggregateFunction extends BoundFunction { + + /** + * Initialize state for an aggregation. + * + * This method is called one or more times for every group of values to initialize intermediate + * aggregation state. More than one intermediate aggregation state variable may be used when the + * aggregation is run in parallel tasks. + * + * The object returned may passed to {@link #update(Object, InternalRow)}, + * and {@link #produceResult(Object)}. Implementations that return null must support null state + * passed into all other methods. + * + * @return a state instance or null + */ + S newAggregationState(); + + /** + * Update the aggregation state with a new row. + * + * This is called for each row in a group to update an intermediate aggregation state. + * + * @param state intermediate aggregation state + * @param input an input row + * @return updated aggregation state + */ + S update(S state, InternalRow input); + + /** + * Produce the aggregation result based on intermediate state. + * + * @param state intermediate aggregation state + * @return a result value + */ + R produceResult(S state); + Review comment: One issue I found with the `Serializable` approach is that currently in Spark the `SerializerInstance` as well as `ExpressionEncoder` all require `ClassTag`, which is not available from Java. This makes it hard to reuse the existing machinery in Spark for the serialization/deserialization work. Another issue, which is reflected by the CI failure, is that simple classes such as: ```scala class IntAverage extends AggregateFunction[(Int, Int), Int] ``` ~~will not work out-of-box, as `(Int, Int)` doesn't implement `Serializable`~~. Edit: sorry `TupleN` does implement `Serializable` in Scala, and the issue is (it seems) we can't get a `AggregateFunction` from a `BoundFunction` with the `Serializable` constraint. The `ClassTag` constraint for `SerializerInstance` was added in #700 for supporting Scala Pickling as one of the serializer implementation but seems the PR never ended in Spark, so not quite sure if it is still needed today. Thanks @viirya for having a offline discussion with me on this. Because of this, I'm wondering if it makes sense to replace the `Serializable` with something else, such as another method: ```java Encoder encoder(); ``` This can be implemented pretty easily by Spark users with [`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala). The approach is similar to the `udaf` API today. For Scala users, we can optionally provide another version of `AggregateFunction` in Scala with implicit, so users don't need to do this. Would like to hear your opinion on this @rdblue @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the
[GitHub] [spark] viirya commented on pull request #31937: [SPARK-10816][SS] Support session window natively
viirya commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804541032 > This PR leverages two different approaches on merging session windows: >1. merging session windows with Spark's aggregation logic (a variant of sort aggregation) >2. updating session window for all rows bound to the same session, and applying aggregation logic afterwards Is the second one the approach we took in #31570? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sadhen commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
sadhen commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804539589 @eddyxu I wrote a UDT with Timestamp, but failed to make it work. For ExampleBox, serialize to list works fine. But for ExamplePointWithTimeUDT, to make `pa.StructArray.from_pandas` work, we need to serialize it to dict. For the following snippets, the python part works fine. But I failed to deserialize the ExamplePointWithTime properly in the Scala part. ``` python class ExamplePointWithTimeUDT(UserDefinedType): """ User-defined type (UDT) for ExamplePointWithTime. """ @classmethod def sqlType(self): return StructType([ StructField("x", DoubleType(), False), StructField("y", DoubleType(), True), StructField("ts", TimestampType(), False), ]) @classmethod def module(cls): return 'pyspark.sql.tests' @classmethod def scalaUDT(cls): return 'org.apache.spark.sql.test.ExamplePointWithTimeUDT' def serialize(self, obj): return {'x': obj.x, 'y': obj.y, 'ts': obj.ts} def deserialize(self, datum): return ExamplePointWithTime(datum['x'], datum['y'], datum['ts']) class ExamplePointWithTime: """ An example class to demonstrate UDT in Scala, Java, and Python. """ __UDT__ = ExamplePointWithTimeUDT() def __init__(self, x, y, ts): self.x = x self.y = y self.ts = ts def __repr__(self): return "ExamplePointWithTime(%s,%s,%s)" % (self.x, self.y, self.ts) def __str__(self): return "(%s,%s,%s)" % (self.x, self.y, self.ts) def __eq__(self, other): return isinstance(other, self.__class__) \ and other.x == self.x and other.y == self.y \ and other.ts == self.ts ``` ``` scala package org.apache.spark.sql.test import java.sql.Timestamp import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.util.ArrayBasedMapData import org.apache.spark.sql.types.{DataType, DoubleType, SQLUserDefinedType, StructField, StructType, TimestampType, UserDefinedType} /** * An example class to demonstrate UDT in Scala, Java, and Python. * @param x x coordinate * @param y y coordinate * @param ts timestamp */ @SQLUserDefinedType(udt = classOf[ExamplePointUDT]) private[sql] class ExamplePointWithTime(val x: Double, val y: Double, val ts: Timestamp) extends Serializable { override def hashCode(): Int = { var hash = 13 hash = hash * 31 + x.hashCode() hash = hash * 31 + y.hashCode() hash = hash * 31 + ts.hashCode() hash } override def equals(other: Any): Boolean = other match { case that: ExamplePointWithTime => this.x == that.x && this.y == that.y && this.ts == that.ts case _ => false } override def toString(): String = s"($x, $y, ${ts.toString})" } /** * User-defined type for [[ExamplePoint]]. */ private[sql] class ExamplePointWithTimeUDT extends UserDefinedType[ExamplePointWithTime] { override def sqlType: DataType = StructType(Array( StructField("x", DoubleType, nullable = false), StructField("y", DoubleType, nullable = true), StructField("ts", TimestampType, nullable = false) )) override def pyUDT: String = "pyspark.testing.sqlutils.ExamplePointWithTimeUDT" override def serialize(p: ExamplePointWithTime): ArrayBasedMapData = { ArrayBasedMapData( Array("x", "y", "ts"), Array(p.x, p.y, p.ts) ) } override def deserialize(datum: Any): ExamplePointWithTime = { datum match { case row: InternalRow => new ExamplePointWithTime( row.getDouble(0), row.getDouble(1), row.get(2, TimestampType) // .asInstanceOf[Timestamp] it is Long, cannot be casted to Timestamp ) } } override def userClass: Class[ExamplePointWithTime] = classOf[ExamplePointWithTime] private[spark] override def asNullable: ExamplePointWithTimeUDT = this } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #24559: [SPARK-27658][SQL] Add FunctionCatalog API
sunchao commented on a change in pull request #24559: URL: https://github.com/apache/spark/pull/24559#discussion_r596497034 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/functions/AggregateFunction.java ## @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connector.catalog.functions; + +import org.apache.spark.sql.catalyst.InternalRow; +import org.apache.spark.sql.types.DataType; + +/** + * Interface for a function that produces a result value by aggregating over multiple input rows. + * + * The JVM type of result values produced by this function must be the type used by Spark's + * InternalRow API for the {@link DataType SQL data type} returned by {@link #resultType()}. + * + * Most implementations should also implement {@link PartialAggregateFunction} so that Spark can + * partially aggregate and shuffle intermediate results, instead of shuffling all rows for an + * aggregate. This reduces the impact of data skew and the amount of data shuffled to produce the + * result. + * + * @param the JVM type for the aggregation's intermediate state + * @param the JVM type of result values + */ +public interface AggregateFunction extends BoundFunction { + + /** + * Initialize state for an aggregation. + * + * This method is called one or more times for every group of values to initialize intermediate + * aggregation state. More than one intermediate aggregation state variable may be used when the + * aggregation is run in parallel tasks. + * + * The object returned may passed to {@link #update(Object, InternalRow)}, + * and {@link #produceResult(Object)}. Implementations that return null must support null state + * passed into all other methods. + * + * @return a state instance or null + */ + S newAggregationState(); + + /** + * Update the aggregation state with a new row. + * + * This is called for each row in a group to update an intermediate aggregation state. + * + * @param state intermediate aggregation state + * @param input an input row + * @return updated aggregation state + */ + S update(S state, InternalRow input); + + /** + * Produce the aggregation result based on intermediate state. + * + * @param state intermediate aggregation state + * @return a result value + */ + R produceResult(S state); + Review comment: One issue I found with the `Serializable` approach is that currently in Spark the `SerializerInstance` as well as `ExpressionEncoder` all require `ClassTag`, which is not available from Java. This makes it hard to reuse the existing machinery in Spark for the serialization/deserialization work. Another issue, which is reflected by the CI failure, is that simple classes such as: ```scala class IntAverage extends AggregateFunction[(Int, Int), Int] ``` ~~will not work out-of-box, as `(Int, Int)` doesn't implement `Serializable`~~. Edit: sorry NVM on this one - `TupleN` does implement `Serializable` and the test failure is due to something else. The `ClassTag` constraint for `SerializerInstance` was added in #700 for supporting Scala Pickling as one of the serializer implementation but seems the PR never ended in Spark, so not quite sure if it is still needed today. Thanks @viirya for having a offline discussion with me on this. Because of this, I'm wondering if it makes sense to replace the `Serializable` with something else, such as another method: ```java Encoder encoder(); ``` This can be implemented pretty easily by Spark users with [`Encoders`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/Encoders.scala). The approach is similar to the `udaf` API today. For Scala users, we can optionally provide another version of `AggregateFunction` in Scala with implicit, so users don't need to do this. Would like to hear your opinion on this @rdblue @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,
[GitHub] [spark] beliefer commented on pull request #31920: [SPARK-33604][SQL] Group exception messages in sql/execution
beliefer commented on pull request #31920: URL: https://github.com/apache/spark/pull/31920#issuecomment-804537772 retest this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31937: [SPARK-10816][SS] Support session window natively
SparkQA commented on pull request #31937: URL: https://github.com/apache/spark/pull/31937#issuecomment-804535263 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40955/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #29754: [SPARK-32875][CORE][TEST] TaskSchedulerImplSuite: For the pattern of submitTasks + resourceOffers + assert, extract the general method.
beliefer commented on pull request #29754: URL: https://github.com/apache/spark/pull/29754#issuecomment-804533704 > @beliefer , could you resolve the conflicts? @dongido001 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31476: [SPARK-34366][SQL] Add interface for DS v2 metrics
SparkQA commented on pull request #31476: URL: https://github.com/apache/spark/pull/31476#issuecomment-804533230 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40954/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31611: [SPARK-34488][CORE] Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specifi
AngersZh commented on a change in pull request #31611: URL: https://github.com/apache/spark/pull/31611#discussion_r599197632 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala ## @@ -113,10 +113,15 @@ private[spark] class AppStatusStore( } } - def stageData(stageId: Int, details: Boolean = false): Seq[v1.StageData] = { + def stageData( +stageId: Int, +details: Boolean = false, +withSummaries: Boolean = false, Review comment: > OK I see it now, withSummaries causes more info to be returned Yea, return more summaries metrics in distribution. It's very useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
AmplabJenkins removed a comment on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804529977 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40956/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
AmplabJenkins commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804529977 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40956/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31735: [SPARK-34799][PYTHON][SQL] Return User-defined types from Pandas UDF
SparkQA commented on pull request #31735: URL: https://github.com/apache/spark/pull/31735#issuecomment-804529959 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40956/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31102: [SPARK-34054][CORE] BlockManagerDecommissioner code cleanup
SparkQA commented on pull request #31102: URL: https://github.com/apache/spark/pull/31102#issuecomment-804529725 **[Test build #136375 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136375/testReport)** for PR 31102 at commit [`d6c682a`](https://github.com/apache/spark/commit/d6c682a315d1543e5f739b31af47c21755ab5a76). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #31931: [SPARK-34707][SQL] Code-gen broadcast nested loop join (left outer/right outer)
SparkQA commented on pull request #31931: URL: https://github.com/apache/spark/pull/31931#issuecomment-804529467 **[Test build #136374 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136374/testReport)** for PR 31931 at commit [`8ee7536`](https://github.com/apache/spark/commit/8ee75369cb66abf85ecf6f7bde98cbdd3f1287b9). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org