[GitHub] [spark] AmplabJenkins commented on pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32738:
URL: https://github.com/apache/spark/pull/32738#issuecomment-852757735


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32710: [SPARK-35574][BUILD] Add a compile arg to turn compilation warnings related to `procedure syntax` to compilation errors in Scala 2.13

2021-06-01 Thread GitBox


SparkQA commented on pull request #32710:
URL: https://github.com/apache/spark/pull/32710#issuecomment-852757960


   **[Test build #139199 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139199/testReport)**
 for PR 32710 at commit 
[`dcfd353`](https://github.com/apache/spark/commit/dcfd353391514296ac599e7f11fe2484ec00e36e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32730:
URL: https://github.com/apache/spark/pull/32730#discussion_r643669887



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -234,6 +252,7 @@ private class ShuffleStatus(
 for (mapIndex <- mapStatuses.indices) {
   if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) {
 _numAvailableMapOutputs -= 1
+mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex)

Review comment:
   Thank you for review. In that case, `mapId` is not the same, isn't it? 
We are reusing with `mapId` at [line 
170](https://github.com/apache/spark/pull/32730/files#diff-a3b15298f97577c1fadcc2d76d015eebd6343e246c6717417d33f3c458847f46R170),
 @mridulm .
   ```
   val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852756658


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852756636


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32736:
URL: https://github.com/apache/spark/pull/32736#issuecomment-852756623


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139189/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852756636


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852756658


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32736:
URL: https://github.com/apache/spark/pull/32736#issuecomment-852756623


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139189/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #32736:
URL: https://github.com/apache/spark/pull/32736#issuecomment-852652347


   **[Test build #139189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139189/testReport)**
 for PR 32736 at commit 
[`9aa3249`](https://github.com/apache/spark/commit/9aa3249516b0477d1830c61642b030a127d515d1).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32736: [SPARK-35604][SQL] Fix condition check for FULL OUTER sort merge join

2021-06-01 Thread GitBox


SparkQA commented on pull request #32736:
URL: https://github.com/apache/spark/pull/32736#issuecomment-852753829


   **[Test build #139189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139189/testReport)**
 for PR 32736 at commit 
[`9aa3249`](https://github.com/apache/spark/commit/9aa3249516b0477d1830c61642b030a127d515d1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32730:
URL: https://github.com/apache/spark/pull/32730#discussion_r643670680



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -234,6 +252,7 @@ private class ShuffleStatus(
 for (mapIndex <- mapStatuses.indices) {
   if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) {
 _numAvailableMapOutputs -= 1
+mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex)

Review comment:
   BTW, I agree with you that we don't have a test coverage for the 
indeterministic stage case. Let me try to add some.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32730:
URL: https://github.com/apache/spark/pull/32730#discussion_r643670680



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -234,6 +252,7 @@ private class ShuffleStatus(
 for (mapIndex <- mapStatuses.indices) {
   if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) {
 _numAvailableMapOutputs -= 1
+mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex)

Review comment:
   BTW, I agree with you that we don't have a test coverage for the 
indeterministic stage case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32730:
URL: https://github.com/apache/spark/pull/32730#discussion_r643669887



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -234,6 +252,7 @@ private class ShuffleStatus(
 for (mapIndex <- mapStatuses.indices) {
   if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) {
 _numAvailableMapOutputs -= 1
+mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex)

Review comment:
   In that case, `mapId` is not the same, isn't it? We are reusing with 
`mapId` at [line 
170](https://github.com/apache/spark/pull/32730/files#diff-a3b15298f97577c1fadcc2d76d015eebd6343e246c6717417d33f3c458847f46R170),
 @mridulm .
   ```
   val index = mapStatusesDeleted.indexWhere(x => x != null && x.mapId == mapId)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32731: [SPARK-35594] Remove duplicate installations in build_and_test.yml

2021-06-01 Thread GitBox


HyukjinKwon closed pull request #32731:
URL: https://github.com/apache/spark/pull/32731


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-06-01 Thread GitBox


dongjoon-hyun commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-852744257


   Thank you so much, @maropu !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pingsutw opened a new pull request #32738: [SPARK-35474] Enable disallow_untyped_defs mypy check for pyspark.pandas.indexing.

2021-06-01 Thread GitBox


pingsutw opened a new pull request #32738:
URL: https://github.com/apache/spark/pull/32738


   
   
   ### What changes were proposed in this pull request?
   
   Adds more type annotations in the file:
   `python/pyspark/pandas/spark/indexing.py`
   and fixes the mypy check failures.
   
   ### Why are the changes needed?
   
   We should enable more disallow_untyped_defs mypy checks.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes.
   This PR adds more type annotations in pandas APIs on Spark module, which can 
impact interaction with development tools for users.
   
   ### How was this patch tested?
   
   The mypy check with a new configuration and existing tests should pass.
   `./dev/lint-python`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r643668281



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
+val replacedResources = mutable.ArrayBuffer[HasMetadata]()

Review comment:
   Ya, maybe. I didn't try to add another semantic like adding uniqueness 
here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pingsutw commented on a change in pull request #32731: [SPARK-35594] Remove duplicate installations in build_and_test.yml

2021-06-01 Thread GitBox


pingsutw commented on a change in pull request #32731:
URL: https://github.com/apache/spark/pull/32731#discussion_r643667946



##
File path: .github/workflows/build_and_test.yml
##
@@ -357,11 +357,7 @@ jobs:
 architecture: x64
 - name: Install Python linter dependencies
   run: |
-# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested 
classes.

Review comment:
   okay. Let's close this issue. Thanks for the review.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852742950


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43720/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on pull request #32716: [SPARK-35578][SQL][TEST] Add a test case for a bug in janino

2021-06-01 Thread GitBox


maropu commented on pull request #32716:
URL: https://github.com/apache/spark/pull/32716#issuecomment-852742969


   I opened a PR to fix this bug in janino: 
https://github.com/janino-compiler/janino/pull/148


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32288: [SPARK-35182][K8S] Support driver-owned on-demand PVC

2021-06-01 Thread GitBox


dongjoon-hyun commented on a change in pull request #32288:
URL: https://github.com/apache/spark/pull/32288#discussion_r643666101



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
##
@@ -85,6 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: 
KubernetesConf)
   .withApiVersion("v1")
   .withNewMetadata()
 .withName(claimName)
+.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId)

Review comment:
   Yes, `Spark Driver` pod is already launched in the K8s and  the driver 
is building executor pod specs here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #32727: [SPARK-35589][CORE] BlockManagerMasterEndpoint should not ignore index-only shuffle file during updating

2021-06-01 Thread GitBox


dongjoon-hyun commented on pull request #32727:
URL: https://github.com/apache/spark/pull/32727#issuecomment-852740276


   Thank you, @holdenk !
   
   Hi, @Ngone51 . It's a very common case. Try this.
   
   ```
   scala> Seq((1,2)).toDF("a", "b").repartition(10).groupBy("a").count().show()
   +---+-+
   |  a|count|
   +---+-+
   |  1|1|
   +---+-+
   
   $ ls -al blockmgr-b9389ef6-6328-4953-9d66-e6e2da21f65c/* | grep shuffle
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_9_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_10_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_7_0.index
   -rw-r--r--   1 dongjoon  staff60 Jun  1 22:35 shuffle_1_1_0.data
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_5_0.index
   -rw-r--r--   1 dongjoon  staff59 Jun  1 22:35 shuffle_0_0_0.data
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_3_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_2_0.index
   -rw-r--r--   1 dongjoon  staff88 Jun  1 22:35 shuffle_0_0_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_1_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_4_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_6_0.index
   -rw-r--r--   1 dongjoon  staff  1608 Jun  1 22:35 shuffle_1_8_0.index
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852735448


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852733530


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] venkata91 commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


venkata91 commented on a change in pull request #30691:
URL: https://github.com/apache/spark/pull/30691#discussion_r643661092



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,25 +737,26 @@ private[spark] class MapOutputTrackerMaster(
   }
 
   /**
-   * Unregisters a merge result corresponding to the reduceId if present. If 
the optional mapId
-   * is specified, it will only unregister the merge result if the mapId is 
part of that merge
+   * Unregisters a merge result corresponding to the reduceId if present. If 
the optional mapIndex
+   * is specified, it will only unregister the merge result if the mapIndex is 
part of that merge
* result.
*
* @param shuffleId the shuffleId.
* @param reduceId  the reduceId.
* @param bmAddress block manager address.
-   * @param mapId the optional mapId which should be checked to see it was 
part of the merge
-   *  result.
+   * @param mapIndex  the optional mapIndex which should be checked to see it 
was part of the
+   *  merge result.
*/
   def unregisterMergeResult(
 shuffleId: Int,
 reduceId: Int,
 bmAddress: BlockManagerId,
-mapId: Option[Int] = None) {
+mapIndex: Option[Int] = None) {
 shuffleStatuses.get(shuffleId) match {
   case Some(shuffleStatus) =>
 val mergeStatus = shuffleStatus.mergeStatuses(reduceId)
-if (mergeStatus != null && (mapId.isEmpty || 
mergeStatus.tracker.contains(mapId.get))) {
+if (mergeStatus != null &&
+  (mapIndex.isEmpty || mergeStatus.tracker.contains(mapIndex.get))) {
   shuffleStatus.removeMergeResult(reduceId, bmAddress)

Review comment:
   [SPARK-32923](https://issues.apache.org/jira/browse/SPARK-32923) would 
handle non deterministic stage retries right? Do you mean we should remove the 
`mapOutputTracker.unregisterMergeResult` call in `DAGScheduler`? This change is 
already added as part of 
[SPARK-32921](https://issues.apache.org/jira/browse/SPARK-32921)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon closed pull request #32723:
URL: https://github.com/apache/spark/pull/32723


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


HyukjinKwon commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852732001


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852727869


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139187/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852727869


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139187/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852625338


   **[Test build #139187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139187/testReport)**
 for PR 32723 at commit 
[`13f9f86`](https://github.com/apache/spark/commit/13f9f8669511850f6592ae4eb5a202ec3491ee0f).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852726882


   **[Test build #139187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139187/testReport)**
 for PR 32723 at commit 
[`13f9f86`](https://github.com/apache/spark/commit/13f9f8669511850f6592ae4eb5a202ec3491ee0f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852724526


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852724506


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852724526


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852707740


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852722499


   **[Test build #139198 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139198/testReport)**
 for PR 32723 at commit 
[`bd91a48`](https://github.com/apache/spark/commit/bd91a48a90fe1647a45f4ccb1ab7ec395d2ea4eb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852722290


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43719/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852721570


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139188/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852721821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139196/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852699735


   **[Test build #139196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)**
 for PR 32513 at commit 
[`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852721567


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43712/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852721568


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852721821


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139196/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852721610


   **[Test build #139196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)**
 for PR 32513 at commit 
[`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852721568


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852721570


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139188/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852721567


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43712/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852720873


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43718/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852720355


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43716/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852720338


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43717/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] otterc commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


otterc commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852719608


   Looks good to me. Thanks @venkata91 for working on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852717464


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43715/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852652322


   **[Test build #139188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139188/testReport)**
 for PR 32737 at commit 
[`703d903`](https://github.com/apache/spark/commit/703d903dc9848cefa6e98e9e190bc6f99c9db6bb).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852712631


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43712/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852710021


   **[Test build #139188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139188/testReport)**
 for PR 32737 at commit 
[`703d903`](https://github.com/apache/spark/commit/703d903dc9848cefa6e98e9e190bc6f99c9db6bb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down

2021-06-01 Thread GitBox


SparkQA commented on pull request #32061:
URL: https://github.com/apache/spark/pull/32061#issuecomment-852708997


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32061:
URL: https://github.com/apache/spark/pull/32061#issuecomment-852709012


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852707740


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852707718


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32737:
URL: https://github.com/apache/spark/pull/32737#discussion_r643639455



##
File path: .github/workflows/build_and_test.yml
##
@@ -217,6 +217,9 @@ jobs:
   run: |
 python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8
 python3.6 -m pip list
+- name: List Python packages (Python 3.9)
+  run: |
+python3.9 -m pip list

Review comment:
   reading the update for docker image 
(https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage/commit/e6e1d1a62e1db1b3db878a16002a4704f2c65535)
 seems like it should have `pip` ... weird .. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32718: [SPARK-21957][SQL] Support current_user function

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-852705831


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43711/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user function

2021-06-01 Thread GitBox


SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-852705791


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43711/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32718: [SPARK-21957][SQL] Support current_user function

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-852705831


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43711/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32737:
URL: https://github.com/apache/spark/pull/32737#discussion_r643637973



##
File path: .github/workflows/build_and_test.yml
##
@@ -217,6 +217,9 @@ jobs:
   run: |
 python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8
 python3.6 -m pip list
+- name: List Python packages (Python 3.9)
+  run: |
+python3.9 -m pip list

Review comment:
   @dongjoon-hyun did you use conda to install these packages in the docker 
image?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32730: [SPARK-35593][K8S][CORE] Support shuffle data recovery on the reused PVCs

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #32730:
URL: https://github.com/apache/spark/pull/32730#discussion_r643637623



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -234,6 +252,7 @@ private class ShuffleStatus(
 for (mapIndex <- mapStatuses.indices) {
   if (mapStatuses(mapIndex) != null && f(mapStatuses(mapIndex).location)) {
 _numAvailableMapOutputs -= 1
+mapStatusesDeleted(mapIndex) = mapStatuses(mapIndex)

Review comment:
   This has potential for interacting badly with correctness changes, right 
?
   See `DAGScheduler.submitMissingTasks` when stage is Indeterminate




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32737:
URL: https://github.com/apache/spark/pull/32737#discussion_r643637374



##
File path: .github/workflows/build_and_test.yml
##
@@ -217,6 +217,9 @@ jobs:
   run: |
 python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8
 python3.6 -m pip list
+- name: List Python packages (Python 3.9)
+  run: |
+python3.9 -m pip list

Review comment:
   oh, interesting .. 
https://github.com/xinrong-databricks/spark/runs/2724401328?check_suite_focus=true
   
   ```
   /usr/bin/python3.9: No module named pip
   ```
   
   it complains that `python3.9` doesn't have `pip` .. but assuming the skip 
tests checks (https://github.com/apache/spark/runs/2707956326 for 
https://github.com/apache/spark/commit/c225196be0d18975a3d290b0b9d7283d764be322),
 seems all packages are installed properly for Python 3.9.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32592:
URL: https://github.com/apache/spark/pull/32592#issuecomment-852703612


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


HyukjinKwon commented on a change in pull request #32737:
URL: https://github.com/apache/spark/pull/32737#discussion_r643637374



##
File path: .github/workflows/build_and_test.yml
##
@@ -217,6 +217,9 @@ jobs:
   run: |
 python3.6 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner 
plotly>=4.8
 python3.6 -m pip list
+- name: List Python packages (Python 3.9)
+  run: |
+python3.9 -m pip list

Review comment:
   oh, interesting .. 
https://github.com/xinrong-databricks/spark/runs/2724401328?check_suite_focus=true
   
   ```
   /usr/bin/python3.9: No module named pip
   ```
   
   it complains that `python3.9` doesn't have `pip` .. but assuming the skip 
tests checks (https://github.com/apache/spark/runs/2707956326 for 
https://github.com/apache/spark/commit/c225196be0d18975a3d290b0b9d7283d764be322),
 seems all packages are installed properly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based

2021-06-01 Thread GitBox


SparkQA commented on pull request #32592:
URL: https://github.com/apache/spark/pull/32592#issuecomment-852703589


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32592: [WIP][SPARK-35343][PYTHON] Make conversion from/to pandas data-type-based

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32592:
URL: https://github.com/apache/spark/pull/32592#issuecomment-852703612


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43710/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r643636239



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {
+val replacedResources = mutable.ArrayBuffer[HasMetadata]()

Review comment:
   Set instead ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32564: [SPARK-35416][K8S] Support PersistentVolumeClaim Reuse

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #32564:
URL: https://github.com/apache/spark/pull/32564#discussion_r643635908



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsAllocator.scala
##
@@ -357,6 +387,36 @@ private[spark] class ExecutorPodsAllocator(
 }
   }
 
+  private def replacePVCsIfNeeded(
+  pod: Pod,
+  resources: Seq[HasMetadata],
+  reusablePVCs: mutable.Buffer[PersistentVolumeClaim]) = {

Review comment:
   nit: Add return type




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852699423


   **[Test build #139194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)**
 for PR 32737 at commit 
[`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852701130


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA removed a comment on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852700292


   **[Test build #139197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)**
 for PR 30691 at commit 
[`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852701114


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139197/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852701130


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139194/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852701114


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139197/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852701107


   **[Test build #139194 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)**
 for PR 32737 at commit 
[`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852701099


   **[Test build #139197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)**
 for PR 30691 at commit 
[`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


SparkQA commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852700292


   **[Test build #139197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139197/testReport)**
 for PR 30691 at commit 
[`19b0b64`](https://github.com/apache/spark/commit/19b0b64e3d3e30591c22053854ce04fcba936757).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-742037234


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32513: [SPARK-35378][SQL] Eagerly execute non-root Command

2021-06-01 Thread GitBox


SparkQA commented on pull request #32513:
URL: https://github.com/apache/spark/pull/32513#issuecomment-852699735


   **[Test build #139196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139196/testReport)**
 for PR 32513 at commit 
[`6011bbe`](https://github.com/apache/spark/commit/6011bbe40948f3e46f5e40d01ed7c98064b354f0).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852699502


   **[Test build #139195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139195/testReport)**
 for PR 32724 at commit 
[`c7996bb`](https://github.com/apache/spark/commit/c7996bbdfc7af288979ea65102e7637d4c5da1a7).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


SparkQA commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852699423


   **[Test build #139194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139194/testReport)**
 for PR 32737 at commit 
[`c77a184`](https://github.com/apache/spark/commit/c77a184ec981a592e3b2f72d2b35c3c32b36302e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins removed a comment on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852698377


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43708/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #32737: [SPARK-35606][PYTHON][INFRA] List Python 3.9 installed libraries in build_and_test workflow

2021-06-01 Thread GitBox


AmplabJenkins commented on pull request #32737:
URL: https://github.com/apache/spark/pull/32737#issuecomment-852698377


   
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43708/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32724: [SPARK-35585][SQL] Support propagate empty relation through project/filter

2021-06-01 Thread GitBox


SparkQA commented on pull request #32724:
URL: https://github.com/apache/spark/pull/32724#issuecomment-852697807


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43712/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32288: [SPARK-35182][K8S] Support driver-owned on-demand PVC

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #32288:
URL: https://github.com/apache/spark/pull/32288#discussion_r643631948



##
File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/MountVolumesFeatureStep.scala
##
@@ -85,6 +85,7 @@ private[spark] class MountVolumesFeatureStep(conf: 
KubernetesConf)
   .withApiVersion("v1")
   .withNewMetadata()
 .withName(claimName)
+.addToLabels(SPARK_APP_ID_LABEL, conf.sparkConf.getAppId)

Review comment:
   I am trying to understand this ... will the sparkConf.getAppId be 
available at this point ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32061: [WIP][SPARK-32833][SQL] JDBC V2 Datasource aggregate push down

2021-06-01 Thread GitBox


SparkQA commented on pull request #32061:
URL: https://github.com/apache/spark/pull/32061#issuecomment-852694791


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43714/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang closed pull request #32721: [SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-06-01 Thread GitBox


gengliangwang closed pull request #32721:
URL: https://github.com/apache/spark/pull/32721


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32723: [SPARK-35583][DOCS] Move JDBC data source options from Python and Scala into a single page

2021-06-01 Thread GitBox


SparkQA commented on pull request #32723:
URL: https://github.com/apache/spark/pull/32723#issuecomment-852694342


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43713/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gengliangwang commented on pull request #32721: [SPARK-35077][SQL] Migrate to transformWithPruning for leftover optimizer rules

2021-06-01 Thread GitBox


gengliangwang commented on pull request #32721:
URL: https://github.com/apache/spark/pull/32721#issuecomment-852693943


   Merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #30691:
URL: https://github.com/apache/spark/pull/30691#discussion_r643630037



##
File path: core/src/main/scala/org/apache/spark/MapOutputTracker.scala
##
@@ -737,25 +737,26 @@ private[spark] class MapOutputTrackerMaster(
   }
 
   /**
-   * Unregisters a merge result corresponding to the reduceId if present. If 
the optional mapId
-   * is specified, it will only unregister the merge result if the mapId is 
part of that merge
+   * Unregisters a merge result corresponding to the reduceId if present. If 
the optional mapIndex
+   * is specified, it will only unregister the merge result if the mapIndex is 
part of that merge
* result.
*
* @param shuffleId the shuffleId.
* @param reduceId  the reduceId.
* @param bmAddress block manager address.
-   * @param mapId the optional mapId which should be checked to see it was 
part of the merge
-   *  result.
+   * @param mapIndex  the optional mapIndex which should be checked to see it 
was part of the
+   *  merge result.
*/
   def unregisterMergeResult(
 shuffleId: Int,
 reduceId: Int,
 bmAddress: BlockManagerId,
-mapId: Option[Int] = None) {
+mapIndex: Option[Int] = None) {
 shuffleStatuses.get(shuffleId) match {
   case Some(shuffleStatus) =>
 val mergeStatus = shuffleStatus.mergeStatuses(reduceId)
-if (mergeStatus != null && (mapId.isEmpty || 
mergeStatus.tracker.contains(mapId.get))) {
+if (mergeStatus != null &&
+  (mapIndex.isEmpty || mergeStatus.tracker.contains(mapIndex.get))) {
   shuffleStatus.removeMergeResult(reduceId, bmAddress)

Review comment:
   Is this more logically part of SPARK-32923 and move it out of this PR ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #32718: [SPARK-21957][SQL] Support current_user function

2021-06-01 Thread GitBox


SparkQA commented on pull request #32718:
URL: https://github.com/apache/spark/pull/32718#issuecomment-852692892


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43711/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #30691: [SPARK-32920][SHUFFLE] Finalization of Shuffle push/merge with Push based shuffle and preparation step for the reduce stage

2021-06-01 Thread GitBox


mridulm commented on pull request #30691:
URL: https://github.com/apache/spark/pull/30691#issuecomment-852692138


   Jenkins test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #31908: [SPARK-34808][SQL] Removes outer join if it only has DISTINCT on streamed side

2021-06-01 Thread GitBox


cloud-fan commented on a change in pull request #31908:
URL: https://github.com/apache/spark/pull/31908#discussion_r643628995



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
##
@@ -165,6 +170,23 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with 
PredicateHelper {
 case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | 
FullOuter, _, _)) =>
   val newJoinType = buildNewJoinType(f, j)
   if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType 
= newJoinType))
+
+case a @ Aggregate(_, _, join @ Join(left, _, LeftOuter, _, _))
+if a.isDistinct && a.references.subsetOf(AttributeSet(left.output)) &&
+  !canPlanAsBroadcastHashJoin(join, conf) =>

Review comment:
   The aggregate should still be there. I mean we can remove this 
`canPlanAsBroadcastHashJoin` check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #31908: [SPARK-34808][SQL] Removes outer join if it only has DISTINCT on streamed side

2021-06-01 Thread GitBox


wangyum commented on a change in pull request #31908:
URL: https://github.com/apache/spark/pull/31908#discussion_r643628381



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
##
@@ -165,6 +170,23 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with 
PredicateHelper {
 case f @ Filter(condition, j @ Join(_, _, RightOuter | LeftOuter | 
FullOuter, _, _)) =>
   val newJoinType = buildNewJoinType(f, j)
   if (j.joinType == newJoinType) f else Filter(condition, j.copy(joinType 
= newJoinType))
+
+case a @ Aggregate(_, _, join @ Join(left, _, LeftOuter, _, _))
+if a.isDistinct && a.references.subsetOf(AttributeSet(left.output)) &&
+  !canPlanAsBroadcastHashJoin(join, conf) =>

Review comment:
   The result may be incorrect if always remove the join. For example:
   ```
   0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> create table test11.t1 using 
parquet as select id % 3 as a, id as b from range(10);
   +-+--+
   | Result  |
   +-+--+
   +-+--+
   No rows selected (1.611 seconds)
   0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> create table test11.t2 using 
parquet as select id % 3 as x, id as y from range(5);
   +-+--+
   | Result  |
   +-+--+
   +-+--+
   No rows selected (1.043 seconds)
   0: jdbc:hive2://hdc49-mcc10-01-0510-2005-006-> select t1.a from t1 left join 
t2 on a = x;
   ++--+
   | a  |
   ++--+
   | 0  |
   | 0  |
   | 1  |
   | 1  |
   | 0  |
   | 0  |
   | 1  |
   | 1  |
   | 2  |
   | 0  |
   | 0  |
   | 1  |
   | 1  |
   | 2  |
   | 0  |
   | 0  |
   | 2  |
   ++--+
   17 rows selected (1.409 seconds)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on a change in pull request #32733: [SPARK-35596][CORE] HighlyCompressedMapStatus should record accurately the size of skewed shuffle blocks

2021-06-01 Thread GitBox


mridulm commented on a change in pull request #32733:
URL: https://github.com/apache/spark/pull/32733#discussion_r643627409



##
File path: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala
##
@@ -258,18 +258,27 @@ private[spark] object HighlyCompressedMapStatus {
 val threshold = Option(SparkEnv.get)
   .map(_.conf.get(config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD))
   .getOrElse(config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD.defaultValue.get)
+val minThreshold = Option(SparkEnv.get)
+  .map(_.conf.get(config.SHUFFLE_ACCURATE_SKEWED_BLOCK_THRESHOLD))
+  
.getOrElse(config.SHUFFLE_ACCURATE_SKEWED_BLOCK_THRESHOLD.defaultValue.get)
 val hugeBlockSizes = mutable.Map.empty[Int, Byte]
+val nonEmptyUncompressedSizes = uncompressedSizes.filter(_ > 0)
+val overallNonEmptyAvgSize = if (nonEmptyUncompressedSizes.nonEmpty) {
+  nonEmptyUncompressedSizes.sum / nonEmptyUncompressedSizes.length
+} else {
+  0
+}
 while (i < totalNumBlocks) {
   val size = uncompressedSizes(i)
   if (size > 0) {
 numNonEmptyBlocks += 1
 // Huge blocks are not included in the calculation for average size, 
thus size for smaller
 // blocks is more accurate.
-if (size < threshold) {
+if ((size >= 5 * overallNonEmptyAvgSize && size >= minThreshold) || 
size >= threshold) {

Review comment:
   Echo'ing @dongjoon-hyun's comment above - what is the background of this 
change ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   >