[GitHub] [spark] AmplabJenkins commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.
AmplabJenkins commented on pull request #32738: URL: https://github.com/apache/spark/pull/32738#issuecomment-852757735 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32710: [SPARK-35574][BUILD] Add a compile arg to turn compilation warnings related to `procedure syntax` to compilation errors in Scala 2.13
SparkQA commented on pull request #32710: URL: https://github.com/apache/spark/pull/32710#issuecomment-852757960 **[Test build #139199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139199/testReport)** for PR 32710 at commit [`dcfd353`](https://github.com/apache/spark/commit/dcfd353391514296ac599e7f11fe2484ec00e36e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs
dongjoon-hyun commented on a change in pull request #32730: URL: https://github.com/apache/spark/pull/32730#discussion_r643669887 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -234,6 +252,7 @@ private class ShuffleStatus( for (mapIndex <- mapStatuses.indices) { if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) { _numAvailableMapOutputs -= 1 +mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) Review comment: Thank you for review. In that case, `mapId` is not the same, isn't it? We are reusing with `mapId` at [line 170](https://github.com/apache/spark/pull/32730/files#diff-a3b15298f97577c1fadcc2d76d015eebd6343e246c6717417d33f3c458847f46R170), @mridulm . ``` val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
AmplabJenkins removed a comment on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852756658 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43717/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852756636 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43716/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
AmplabJenkins removed a comment on pull request #32736: URL: https://github.com/apache/spark/pull/32736#issuecomment-852756623 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139189/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852756636 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43716/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
AmplabJenkins commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852756658 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43717/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
AmplabJenkins commented on pull request #32736: URL: https://github.com/apache/spark/pull/32736#issuecomment-852756623 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139189/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
SparkQA removed a comment on pull request #32736: URL: https://github.com/apache/spark/pull/32736#issuecomment-852652347 **[Test build #139189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139189/testReport)** for PR 32736 at commit [`9aa3249`](https://github.com/apache/spark/commit/9aa3249516b0477d1830c61642b030a127d515d1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join
SparkQA commented on pull request #32736: URL: https://github.com/apache/spark/pull/32736#issuecomment-852753829 **[Test build #139189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139189/testReport)** for PR 32736 at commit [`9aa3249`](https://github.com/apache/spark/commit/9aa3249516b0477d1830c61642b030a127d515d1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs
dongjoon-hyun commented on a change in pull request #32730: URL: https://github.com/apache/spark/pull/32730#discussion_r643670680 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -234,6 +252,7 @@ private class ShuffleStatus( for (mapIndex <- mapStatuses.indices) { if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) { _numAvailableMapOutputs -= 1 +mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) Review comment: BTW, I agree with you that we don't have a test coverage for the indeterministic stage case. Let me try to add some. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs
dongjoon-hyun commented on a change in pull request #32730: URL: https://github.com/apache/spark/pull/32730#discussion_r643670680 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -234,6 +252,7 @@ private class ShuffleStatus( for (mapIndex <- mapStatuses.indices) { if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) { _numAvailableMapOutputs -= 1 +mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) Review comment: BTW, I agree with you that we don't have a test coverage for the indeterministic stage case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs
dongjoon-hyun commented on a change in pull request #32730: URL: https://github.com/apache/spark/pull/32730#discussion_r643669887 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -234,6 +252,7 @@ private class ShuffleStatus( for (mapIndex <- mapStatuses.indices) { if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) { _numAvailableMapOutputs -= 1 +mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) Review comment: In that case, `mapId` is not the same, isn't it? We are reusing with `mapId` at [line 170](https://github.com/apache/spark/pull/32730/files#diff-a3b15298f97577c1fadcc2d76d015eebd6343e246c6717417d33f3c458847f46R170), @mridulm . ``` val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32731: [SPARK-35594] Remove duplicate installations in build_and_test.yml
HyukjinKwon closed pull request #32731: URL: https://github.com/apache/spark/pull/32731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
dongjoon-hyun commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-852744257 Thank you so much, @maropu ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pingsutw opened a new pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.
pingsutw opened a new pull request #32738: URL: https://github.com/apache/spark/pull/32738 ### What changes were proposed in this pull request? Adds more type annotations in the file: `python/pyspark/pandas/spark/indexing.py` and fixes the mypy check failures. ### Why are the changes needed? We should enable more disallow_untyped_defs mypy checks. ### Does this PR introduce _any_ user-facing change? Yes. This PR adds more type annotations in pandas APIs on Spark module, which can impact interaction with development tools for users. ### How was this patch tested? The mypy check with a new configuration and existing tests should pass. `./dev/lint-python` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse
dongjoon-hyun commented on a change in pull request #32564: URL: https://github.com/apache/spark/pull/32564#discussion_r643668281 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala ## @@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator( } } + private def replacePVCsIfNeeded( + pod: Pod, + resources: Seq[HasMetadata], + reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = { +val replacedResources = mutable.ArrayBuffer[HasMetadata]() Review comment: Ya, maybe. I didn't try to add another semantic like adding uniqueness here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pingsutw commented on a change in pull request #32731: [SPARK-35594] Remove duplicate installations in build_and_test.yml
pingsutw commented on a change in pull request #32731: URL: https://github.com/apache/spark/pull/32731#discussion_r643667946 ## File path: .github/workflows/build_and_test.yml ## @@ -357,11 +357,7 @@ jobs: architecture: x64 - name: Install Python linter dependencies run: | -# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes. Review comment: okay. Let's close this issue. Thanks for the review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852742950 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43720/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino
maropu commented on pull request #32716: URL: https://github.com/apache/spark/pull/32716#issuecomment-852742969 I opened a PR to fix this bug in janino: https://github.com/janino-compiler/janino/pull/148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32288: [SPARK-35182][K8S] Support driver-owned on-demand PVC
dongjoon-hyun commented on a change in pull request #32288: URL: https://github.com/apache/spark/pull/32288#discussion_r643666101 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala ## @@ -85,6 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) .withApiVersion("v1") .withNewMetadata() .withName(claimName) +.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId) Review comment: Yes, `Spark Driver` pod is already launched in the K8s and the driver is building executor pod specs here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32727: [SPARK-35589][CORE] BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating
dongjoon-hyun commented on pull request #32727: URL: https://github.com/apache/spark/pull/32727#issuecomment-852740276 Thank you, @holdenk ! Hi, @Ngone51 . It's a very common case. Try this. ``` scala> Seq((1,2)).toDF("a", "b").repartition(10).groupBy("a").count().show() +---+-+ | a|count| +---+-+ | 1|1| +---+-+ $ ls -al blockmgr-b9389ef6-6328-4953-9d66-e6e2da21f65c/* | grep shuffle -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_9_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_10_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_7_0.index -rw-r--r-- 1 dongjoon staff60 Jun 1 22:35 shuffle_1_1_0.data -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_5_0.index -rw-r--r-- 1 dongjoon staff59 Jun 1 22:35 shuffle_0_0_0.data -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_3_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_2_0.index -rw-r--r-- 1 dongjoon staff88 Jun 1 22:35 shuffle_0_0_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_1_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_4_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_6_0.index -rw-r--r-- 1 dongjoon staff 1608 Jun 1 22:35 shuffle_1_8_0.index ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852735448 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43717/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852733530 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43716/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] venkata91 commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
venkata91 commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r643661092 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,25 +737,26 @@ private[spark] class MapOutputTrackerMaster( } /** - * Unregisters a merge result corresponding to the reduceId if present. If the optional mapId - * is specified, it will only unregister the merge result if the mapId is part of that merge + * Unregisters a merge result corresponding to the reduceId if present. If the optional mapIndex + * is specified, it will only unregister the merge result if the mapIndex is part of that merge * result. * * @param shuffleId the shuffleId. * @param reduceId the reduceId. * @param bmAddress block manager address. - * @param mapId the optional mapId which should be checked to see it was part of the merge - * result. + * @param mapIndex the optional mapIndex which should be checked to see it was part of the + * merge result. */ def unregisterMergeResult( shuffleId: Int, reduceId: Int, bmAddress: BlockManagerId, -mapId: Option[Int] = None) { +mapIndex: Option[Int] = None) { shuffleStatuses.get(shuffleId) match { case Some(shuffleStatus) => val mergeStatus = shuffleStatus.mergeStatuses(reduceId) -if (mergeStatus != null && (mapId.isEmpty || mergeStatus.tracker.contains(mapId.get))) { +if (mergeStatus != null && + (mapIndex.isEmpty || mergeStatus.tracker.contains(mapIndex.get))) { shuffleStatus.removeMergeResult(reduceId, bmAddress) Review comment: [SPARK-32923](https://issues.apache.org/jira/browse/SPARK-32923) would handle non deterministic stage retries right? Do you mean we should remove the `mapOutputTracker.unregisterMergeResult` call in `DAGScheduler`? This change is already added as part of [SPARK-32921](https://issues.apache.org/jira/browse/SPARK-32921) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
HyukjinKwon closed pull request #32723: URL: https://github.com/apache/spark/pull/32723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
HyukjinKwon commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852732001 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852727869 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139187/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852727869 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139187/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA removed a comment on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852625338 **[Test build #139187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139187/testReport)** for PR 32723 at commit [`13f9f86`](https://github.com/apache/spark/commit/13f9f8669511850f6592ae4eb5a202ec3491ee0f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852726882 **[Test build #139187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139187/testReport)** for PR 32723 at commit [`13f9f86`](https://github.com/apache/spark/commit/13f9f8669511850f6592ae4eb5a202ec3491ee0f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins removed a comment on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852724526 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43719/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852724506 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43719/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852724526 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43719/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
AmplabJenkins removed a comment on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852707740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43713/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852722499 **[Test build #139198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139198/testReport)** for PR 32723 at commit [`bd91a48`](https://github.com/apache/spark/commit/bd91a48a90fe1647a45f4ccb1ab7ec395d2ea4eb). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852722290 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43719/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852721570 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139188/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
AmplabJenkins removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852721821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139196/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
SparkQA removed a comment on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852699735 **[Test build #139196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)** for PR 32513 at commit [`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
AmplabJenkins removed a comment on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852721567 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43712/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins removed a comment on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852721568 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43715/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
AmplabJenkins commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852721821 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139196/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852721610 **[Test build #139196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)** for PR 32513 at commit [`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852721568 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43715/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852721570 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139188/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
AmplabJenkins commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852721567 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43712/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852720873 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43718/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852720355 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43716/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852720338 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43717/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] otterc commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
otterc commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852719608 Looks good to me. Thanks @venkata91 for working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852717464 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43715/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852652322 **[Test build #139188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139188/testReport)** for PR 32737 at commit [`703d903`](https://github.com/apache/spark/commit/703d903dc9848cefa6e98e9e190bc6f99c9db6bb). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852712631 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43712/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852710021 **[Test build #139188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139188/testReport)** for PR 32737 at commit [`703d903`](https://github.com/apache/spark/commit/703d903dc9848cefa6e98e9e190bc6f99c9db6bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #32061: URL: https://github.com/apache/spark/pull/32061#issuecomment-852708997 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43714/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down
AmplabJenkins commented on pull request #32061: URL: https://github.com/apache/spark/pull/32061#issuecomment-852709012 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43714/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
AmplabJenkins commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852707740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43713/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852707718 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43713/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
HyukjinKwon commented on a change in pull request #32737: URL: https://github.com/apache/spark/pull/32737#discussion_r643639455 ## File path: .github/workflows/build_and_test.yml ## @@ -217,6 +217,9 @@ jobs: run: | python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 python3.6 -m pip list +- name: List Python packages (Python 3.9) + run: | +python3.9 -m pip list Review comment: reading the update for docker image (https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/commit/e6e1d1a62e1db1b3db878a16002a4704f2c65535) seems like it should have `pip` ... weird .. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user function
AmplabJenkins removed a comment on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-852705831 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43711/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user function
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-852705791 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43711/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user function
AmplabJenkins commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-852705831 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43711/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
HyukjinKwon commented on a change in pull request #32737: URL: https://github.com/apache/spark/pull/32737#discussion_r643637973 ## File path: .github/workflows/build_and_test.yml ## @@ -217,6 +217,9 @@ jobs: run: | python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 python3.6 -m pip list +- name: List Python packages (Python 3.9) + run: | +python3.9 -m pip list Review comment: @dongjoon-hyun did you use conda to install these packages in the docker image? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs
mridulm commented on a change in pull request #32730: URL: https://github.com/apache/spark/pull/32730#discussion_r643637623 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -234,6 +252,7 @@ private class ShuffleStatus( for (mapIndex <- mapStatuses.indices) { if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) { _numAvailableMapOutputs -= 1 +mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex) Review comment: This has potential for interacting badly with correctness changes, right ? See `DAGScheduler.submitMissingTasks` when stage is Indeterminate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
HyukjinKwon commented on a change in pull request #32737: URL: https://github.com/apache/spark/pull/32737#discussion_r643637374 ## File path: .github/workflows/build_and_test.yml ## @@ -217,6 +217,9 @@ jobs: run: | python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 python3.6 -m pip list +- name: List Python packages (Python 3.9) + run: | +python3.9 -m pip list Review comment: oh, interesting .. https://github.com/xinrong-databricks/spark/runs/2724401328?check_suite_focus=true ``` /usr/bin/python3.9: No module named pip ``` it complains that `python3.9` doesn't have `pip` .. but assuming the skip tests checks (https://github.com/apache/spark/runs/2707956326 for https://github.com/apache/spark/commit/c225196be0d18975a3d290b0b9d7283d764be322), seems all packages are installed properly for Python 3.9. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based
AmplabJenkins removed a comment on pull request #32592: URL: https://github.com/apache/spark/pull/32592#issuecomment-852703612 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43710/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
HyukjinKwon commented on a change in pull request #32737: URL: https://github.com/apache/spark/pull/32737#discussion_r643637374 ## File path: .github/workflows/build_and_test.yml ## @@ -217,6 +217,9 @@ jobs: run: | python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner plotly>=4.8 python3.6 -m pip list +- name: List Python packages (Python 3.9) + run: | +python3.9 -m pip list Review comment: oh, interesting .. https://github.com/xinrong-databricks/spark/runs/2724401328?check_suite_focus=true ``` /usr/bin/python3.9: No module named pip ``` it complains that `python3.9` doesn't have `pip` .. but assuming the skip tests checks (https://github.com/apache/spark/runs/2707956326 for https://github.com/apache/spark/commit/c225196be0d18975a3d290b0b9d7283d764be322), seems all packages are installed properly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based
SparkQA commented on pull request #32592: URL: https://github.com/apache/spark/pull/32592#issuecomment-852703589 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43710/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based
AmplabJenkins commented on pull request #32592: URL: https://github.com/apache/spark/pull/32592#issuecomment-852703612 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43710/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse
mridulm commented on a change in pull request #32564: URL: https://github.com/apache/spark/pull/32564#discussion_r643636239 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala ## @@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator( } } + private def replacePVCsIfNeeded( + pod: Pod, + resources: Seq[HasMetadata], + reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = { +val replacedResources = mutable.ArrayBuffer[HasMetadata]() Review comment: Set instead ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse
mridulm commented on a change in pull request #32564: URL: https://github.com/apache/spark/pull/32564#discussion_r643635908 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala ## @@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator( } } + private def replacePVCsIfNeeded( + pod: Pod, + resources: Seq[HasMetadata], + reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = { Review comment: nit: Add return type -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852699423 **[Test build #139194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)** for PR 32737 at commit [`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852701130 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139194/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA removed a comment on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852700292 **[Test build #139197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)** for PR 30691 at commit [`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins removed a comment on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852701114 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139197/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852701130 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139194/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852701114 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139197/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852701107 **[Test build #139194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)** for PR 32737 at commit [`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852701099 **[Test build #139197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)** for PR 30691 at commit [`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
SparkQA commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852700292 **[Test build #139197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)** for PR 30691 at commit [`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
AmplabJenkins removed a comment on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-742037234 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command
SparkQA commented on pull request #32513: URL: https://github.com/apache/spark/pull/32513#issuecomment-852699735 **[Test build #139196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)** for PR 32513 at commit [`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852699502 **[Test build #139195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139195/testReport)** for PR 32724 at commit [`c7996bb`](https://github.com/apache/spark/commit/c7996bbdfc7af288979ea65102e7637d4c5da1a7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
SparkQA commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852699423 **[Test build #139194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)** for PR 32737 at commit [`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins removed a comment on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852698377 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43708/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow
AmplabJenkins commented on pull request #32737: URL: https://github.com/apache/spark/pull/32737#issuecomment-852698377 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43708/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter
SparkQA commented on pull request #32724: URL: https://github.com/apache/spark/pull/32724#issuecomment-852697807 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43712/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32288: [SPARK-35182][K8S] Support driver-owned on-demand PVC
mridulm commented on a change in pull request #32288: URL: https://github.com/apache/spark/pull/32288#discussion_r643631948 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala ## @@ -85,6 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) .withApiVersion("v1") .withNewMetadata() .withName(claimName) +.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId) Review comment: I am trying to understand this ... will the sparkConf.getAppId be available at this point ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down
SparkQA commented on pull request #32061: URL: https://github.com/apache/spark/pull/32061#issuecomment-852694791 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43714/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #32721: [SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
gengliangwang closed pull request #32721: URL: https://github.com/apache/spark/pull/32721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page
SparkQA commented on pull request #32723: URL: https://github.com/apache/spark/pull/32723#issuecomment-852694342 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43713/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32721: [SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules
gengliangwang commented on pull request #32721: URL: https://github.com/apache/spark/pull/32721#issuecomment-852693943 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
mridulm commented on a change in pull request #30691: URL: https://github.com/apache/spark/pull/30691#discussion_r643630037 ## File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ## @@ -737,25 +737,26 @@ private[spark] class MapOutputTrackerMaster( } /** - * Unregisters a merge result corresponding to the reduceId if present. If the optional mapId - * is specified, it will only unregister the merge result if the mapId is part of that merge + * Unregisters a merge result corresponding to the reduceId if present. If the optional mapIndex + * is specified, it will only unregister the merge result if the mapIndex is part of that merge * result. * * @param shuffleId the shuffleId. * @param reduceId the reduceId. * @param bmAddress block manager address. - * @param mapId the optional mapId which should be checked to see it was part of the merge - * result. + * @param mapIndex the optional mapIndex which should be checked to see it was part of the + * merge result. */ def unregisterMergeResult( shuffleId: Int, reduceId: Int, bmAddress: BlockManagerId, -mapId: Option[Int] = None) { +mapIndex: Option[Int] = None) { shuffleStatuses.get(shuffleId) match { case Some(shuffleStatus) => val mergeStatus = shuffleStatus.mergeStatuses(reduceId) -if (mergeStatus != null && (mapId.isEmpty || mergeStatus.tracker.contains(mapId.get))) { +if (mergeStatus != null && + (mapIndex.isEmpty || mergeStatus.tracker.contains(mapIndex.get))) { shuffleStatus.removeMergeResult(reduceId, bmAddress) Review comment: Is this more logically part of SPARK-32923 and move it out of this PR ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user function
SparkQA commented on pull request #32718: URL: https://github.com/apache/spark/pull/32718#issuecomment-852692892 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43711/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage
mridulm commented on pull request #30691: URL: https://github.com/apache/spark/pull/30691#issuecomment-852692138 Jenkins test this please -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #31908: [SPARK-34808][SQL] Removes outer join if it only has DISTINCT on streamed side
cloud-fan commented on a change in pull request #31908: URL: https://github.com/apache/spark/pull/31908#discussion_r643628995 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ## @@ -165,6 +170,23 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | FullOuter, _, _)) => val newJoinType = buildNewJoinType(f, j) if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) + +case a @ Aggregate(_, _, join @ Join(left, _, LeftOuter, _, _)) +if a.isDistinct && a.references.subsetOf(AttributeSet(left.output)) && + !canPlanAsBroadcastHashJoin(join, conf) => Review comment: The aggregate should still be there. I mean we can remove this `canPlanAsBroadcastHashJoin` check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #31908: [SPARK-34808][SQL] Removes outer join if it only has DISTINCT on streamed side
wangyum commented on a change in pull request #31908: URL: https://github.com/apache/spark/pull/31908#discussion_r643628381 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ## @@ -165,6 +170,23 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper { case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | FullOuter, _, _)) => val newJoinType = buildNewJoinType(f, j) if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType = newJoinType)) + +case a @ Aggregate(_, _, join @ Join(left, _, LeftOuter, _, _)) +if a.isDistinct && a.references.subsetOf(AttributeSet(left.output)) && + !canPlanAsBroadcastHashJoin(join, conf) => Review comment: The result may be incorrect if always remove the join. For example: ``` 0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> create table test11.t1 using parquet as select id % 3 as a, id as b from range(10); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.611 seconds) 0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> create table test11.t2 using parquet as select id % 3 as x, id as y from range(5); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.043 seconds) 0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> select t1.a from t1 left join t2 on a = x; ++--+ | a | ++--+ | 0 | | 0 | | 1 | | 1 | | 0 | | 0 | | 1 | | 1 | | 2 | | 0 | | 0 | | 1 | | 1 | | 2 | | 0 | | 0 | | 2 | ++--+ 17 rows selected (1.409 seconds) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a change in pull request #32733: [SPARK-35596][CORE] HighlyCompressedMapStatus should record accurately the size of skewed shuffle blocks
mridulm commented on a change in pull request #32733: URL: https://github.com/apache/spark/pull/32733#discussion_r643627409 ## File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ## @@ -258,18 +258,27 @@ private[spark] object HighlyCompressedMapStatus { val threshold = Option(SparkEnv.get) .map(_.conf.get(config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD)) .getOrElse(config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD.defaultValue.get) +val minThreshold = Option(SparkEnv.get) + .map(_.conf.get(config.SHUFFLE_ACCURATE_SKEWED_BLOCK_THRESHOLD)) + .getOrElse(config.SHUFFLE_ACCURATE_SKEWED_BLOCK_THRESHOLD.defaultValue.get) val hugeBlockSizes = mutable.Map.empty[Int, Byte] +val nonEmptyUncompressedSizes = uncompressedSizes.filter(_ > 0) +val overallNonEmptyAvgSize = if (nonEmptyUncompressedSizes.nonEmpty) { + nonEmptyUncompressedSizes.sum / nonEmptyUncompressedSizes.length +} else { + 0 +} while (i < totalNumBlocks) { val size = uncompressedSizes(i) if (size > 0) { numNonEmptyBlocks += 1 // Huge blocks are not included in the calculation for average size, thus size for smaller // blocks is more accurate. -if (size < threshold) { +if ((size >= 5 * overallNonEmptyAvgSize && size >= minThreshold) || size >= threshold) { Review comment: Echo'ing @dongjoon-hyun's comment above - what is the background of this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org