[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-69256 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-67485 Ah, it's branch-3.0 only. Let me just hotfix in branch-3.0 only. This is an automated message from the

[GitHub] [spark] maropu edited a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-30 Thread GitBox
maropu edited a comment on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-66387 @beliefer could you do follow-up? It seems we just need to update the golden file for 3.0. This is an

[GitHub] [spark] dongjoon-hyun commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-67189 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-65266 Hi, @HyukjinKwon . Could you fix Python linter error on `branch-3.0`? - https://github.com/apache/spark/commits/branch-3.0 ```

[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-66989 Hm, weird. It was a clean backport. Let me make a fix in the master through branch-3.0 to reduce the diff. Seems it's legitimate anyway.

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-65266 Hi, @HyukjinKwon . Could you fix Python linter error on `branch-3.0`? ``` ./python/pyspark/tests/test_rdd.py:787:5: E303 too many blank lines (2) ```

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-66073 Hi, guys. This seems to break `branch-3.0` UT. All Jenkins jobs on `branch-3.0` are failing. Could you take a look? -

[GitHub] [spark] maropu commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-30 Thread GitBox
maropu commented on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-66387 @beliefer could you do follow-up? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-66073 Hi, guys. This seems to break `branch-3.0` UT. All Jenkins jobs on `branch-3.0` are failing. Could you take a look? -

[GitHub] [spark] dongjoon-hyun commented on pull request #28194: [SPARK-31372][SQL][TEST] Display expression schema for double check.

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28194: URL: https://github.com/apache/spark/pull/28194#issuecomment-66073 Hi, guys. This seems to break `branch-3.0` UT. Could you take a look? - org.apache.spark.sql.ExpressionsSchemaSuite.Check schemas for expression examples

[GitHub] [spark] dongjoon-hyun commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-65266 Hi, @HyukjinKwon . Could you fix Python linter error? ``` ./python/pyspark/tests/test_rdd.py:787:5: E303 too many blank lines (2) ```

[GitHub] [spark] HyukjinKwon commented on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-62433 Merged to master and branch-3.0. This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622219929 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622219929 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
SparkQA removed a comment on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622165096 **[Test build #122150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122150/testReport)** for PR 28426 at commit

[GitHub] [spark] SparkQA commented on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
SparkQA commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622219323 **[Test build #122150 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122150/testReport)** for PR 28426 at commit

[GitHub] [spark] maropu commented on a change in pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28351: URL: https://github.com/apache/spark/pull/28351#discussion_r418388591 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -530,6 +530,26 @@ class DataFrameAggregateSuite extends

[GitHub] [spark] maropu commented on a change in pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28351: URL: https://github.com/apache/spark/pull/28351#discussion_r418388508 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -530,6 +530,26 @@ class DataFrameAggregateSuite extends

[GitHub] [spark] maropu commented on a change in pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28351: URL: https://github.com/apache/spark/pull/28351#discussion_r418385983 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -530,6 +530,26 @@ class DataFrameAggregateSuite extends

[GitHub] [spark] maropu commented on a change in pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28351: URL: https://github.com/apache/spark/pull/28351#discussion_r418385825 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala ## @@ -139,6 +148,32 @@ case class

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622213617 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] dilipbiswal commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-04-30 Thread GitBox
dilipbiswal commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-622213789 cc @gatorsmile @cloud-fan @maropu This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622213609 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622213609 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
SparkQA removed a comment on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622115029 **[Test build #122148 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122148/testReport)** for PR 28370 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #26987: [SPARK-30334][SQL] Introduce as_json for marking a column as JSON data

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #26987: URL: https://github.com/apache/spark/pull/26987#issuecomment-622213294 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #26987: [SPARK-30334][SQL] Introduce as_json for marking a column as JSON data

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #26987: URL: https://github.com/apache/spark/pull/26987#issuecomment-622213294 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
SparkQA commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622213331 **[Test build #122148 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122148/testReport)** for PR 28370 at commit

[GitHub] [spark] HyukjinKwon edited a comment on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-04-30 Thread GitBox
HyukjinKwon edited a comment on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-622212235 cc @vanzin FYI This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #28412: [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI faster

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28412: URL: https://github.com/apache/spark/pull/28412#issuecomment-622212235 cc @vanzin This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28164: [SPARK-31393][SQL] Show the correct alias in a more elegant way that override the prettyName

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-622211577 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in a more elegant way that override the prettyName

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-622211577 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28164: [SPARK-31393][SQL] Show the correct alias in a more elegant way that override the prettyName

2020-04-30 Thread GitBox
SparkQA commented on pull request #28164: URL: https://github.com/apache/spark/pull/28164#issuecomment-622211320 **[Test build #122154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122154/testReport)** for PR 28164 at commit

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-04-30 Thread GitBox
HyukjinKwon commented on a change in pull request #28422: URL: https://github.com/apache/spark/pull/28422#discussion_r418381154 ## File path: docs/structured-streaming-programming-guide.md ## @@ -542,6 +542,12 @@ Here are the details of all the sources in Spark.

[GitHub] [spark] HyukjinKwon commented on a change in pull request #28422: [SPARK-17604][SS] FileStreamSource: provide a new option to have retention on input files

2020-04-30 Thread GitBox
HyukjinKwon commented on a change in pull request #28422: URL: https://github.com/apache/spark/pull/28422#discussion_r418379475 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala ## @@ -71,8 +71,13 @@ class FileStreamSource(

[GitHub] [spark] HyukjinKwon commented on pull request #28424: [SPARK-31618][SQL] Distinct pushdown in Intersect Distinct based on stats

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28424: URL: https://github.com/apache/spark/pull/28424#issuecomment-622207384 cc @wzhfy This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #28418: [SPARK-28424][TESTS][FOLLOW-UP] Add test cases for all interval units

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28418: URL: https://github.com/apache/spark/pull/28418#issuecomment-622206363 Merged to master, and branch-3.0. This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28331: [WIP][SPARK-20629][CORE] Copy shuffle data when nodes are being shutdown

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28331: URL: https://github.com/apache/spark/pull/28331#issuecomment-622204811 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #28331: [WIP][SPARK-20629][CORE] Copy shuffle data when nodes are being shutdown

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28331: URL: https://github.com/apache/spark/pull/28331#issuecomment-622204811 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA commented on pull request #28331: [WIP][SPARK-20629][CORE] Copy shuffle data when nodes are being shutdown

2020-04-30 Thread GitBox
SparkQA commented on pull request #28331: URL: https://github.com/apache/spark/pull/28331#issuecomment-622204517 **[Test build #122153 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122153/testReport)** for PR 28331 at commit

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622203530 So I tried it out locally, and if instead of disabling the entire stop (we want to keep the interrupt call), we call interrupt & inside of the catch block with interrupt set

[GitHub] [spark] HyukjinKwon commented on pull request #28395: [SPARK-31549][PYSPARK] Add a develop API invoking collect on Python RDD with user-specified job group

2020-04-30 Thread GitBox
HyukjinKwon commented on pull request #28395: URL: https://github.com/apache/spark/pull/28395#issuecomment-622199424 Merged to master and branch-3.0. Thanks @mengxr, @WeichenXu123 and @dongjoon-hyun. This is an automated

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-622197638 Oh yeah to be clear the line numbers inside of block manager are different because I was playing with some debugging, but the rest of it should be fairly direct.

[GitHub] [spark] AmplabJenkins commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-622197058 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-622197046 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-622197046 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-622197058 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-622196608 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-30 Thread GitBox
SparkQA commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-622196761 **[Test build #122151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122151/testReport)** for PR 27649 at commit

[GitHub] [spark] SparkQA commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-04-30 Thread GitBox
SparkQA commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-622196758 **[Test build #122152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122152/testReport)** for PR 24173 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-622196608 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HeartSaVioR commented on pull request #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-30 Thread GitBox
HeartSaVioR commented on pull request #27649: URL: https://github.com/apache/spark/pull/27649#issuecomment-622196442 retest this, please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] SparkQA removed a comment on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-04-30 Thread GitBox
SparkQA removed a comment on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-622083646 **[Test build #122147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122147/testReport)** for PR 28425 at commit

[GitHub] [spark] HeartSaVioR commented on pull request #24173: [SPARK-27237][SS] Introduce State schema validation among query restart

2020-04-30 Thread GitBox
HeartSaVioR commented on pull request #24173: URL: https://github.com/apache/spark/pull/24173#issuecomment-622196282 retest this, please This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] SparkQA commented on pull request #28425: [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's Scan Node

2020-04-30 Thread GitBox
SparkQA commented on pull request #28425: URL: https://github.com/apache/spark/pull/28425#issuecomment-622196134 **[Test build #122147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122147/testReport)** for PR 28425 at commit

[GitHub] [spark] holdenk commented on pull request #28370: [SPARK-20732][CORE] Decommission cache blocks to other executors when an executor is decommissioned

2020-04-30 Thread GitBox
holdenk commented on pull request #28370: URL: https://github.com/apache/spark/pull/28370#issuecomment-62219 Looks like the tests are passing but were still seeing the executor hang, I did a jstack dump on a local run and I got: > 2020-04-30 17:44:40 > Full thread dump

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418363468 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418363468 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418362998 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public

[GitHub] [spark] dongjoon-hyun commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-622190075 cc @holdenk since this is a correctness issue in Apache Spark 2.0.2 ~ 2.4.5 at least. This is an

[GitHub] [spark] manuzhang commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
manuzhang commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418360218 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-622186915 Hi, @hvanhovell . What about your comment? Are we going to merge this AS-IS or do you want to revise the comment more? -

[GitHub] [spark] dongjoon-hyun commented on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-622186915 Hi, @hvanhovell . What about your comment? - https://github.com/apache/spark/pull/28351/files#r417924185

[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28351: [SPARK-31500][SQL] collect_set() of BinaryType returns duplicate elements

2020-04-30 Thread GitBox
dongjoon-hyun edited a comment on pull request #28351: URL: https://github.com/apache/spark/pull/28351#issuecomment-622186915 Hi, @hvanhovell . What about your comment? Are we going to merge this AS-IS or do you want to change it? -

[GitHub] [spark] manuzhang commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
manuzhang commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418358980 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -198,6 +198,7 @@ protected void

[GitHub] [spark] manuzhang commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
manuzhang commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418357988 ## File path: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java ## @@ -242,7 +240,6 @@ public

[GitHub] [spark] github-actions[bot] commented on pull request #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle Project nodes

2020-04-30 Thread GitBox
github-actions[bot] commented on pull request #20345: URL: https://github.com/apache/spark/pull/20345#issuecomment-622184903 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622183917 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622183912 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To

[GitHub] [spark] AmplabJenkins commented on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622183912 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
SparkQA removed a comment on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622133810 **[Test build #122149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122149/testReport)** for PR 28416 at commit

[GitHub] [spark] SparkQA commented on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
SparkQA commented on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622183603 **[Test build #122149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122149/testReport)** for PR 28416 at commit

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r418353477 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsSuite.scala ## @@ -0,0 +1,97 @@ +/* + * Licensed to the

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r418353133 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala ## @@ -221,3 +223,22 @@ object

[GitHub] [spark] jiangxb1987 commented on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
jiangxb1987 commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622180070 Thank you @dongjoon-hyun ! This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r418352221 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBuckets.scala ## @@ -0,0 +1,34 @@ +/* + * Licensed to the Apache

[GitHub] [spark] dongjoon-hyun commented on pull request #28421: [SPARK-31616][SQL] Add partition event listener in ExternalCatalogWithListener

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28421: URL: https://github.com/apache/spark/pull/28421#issuecomment-622179937 Thank you for making a PR, @wankunde . However this seems to be a duplicate of the existing PR. - https://github.com/apache/spark/pull/27030

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.time

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28426: URL: https://github.com/apache/spark/pull/28426#discussion_r418351194 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -521,14 +521,14 @@ package object config {

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r418351808 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,109 @@ +/* + * Licensed to the

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.time

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28426: URL: https://github.com/apache/spark/pull/28426#discussion_r418351194 ## File path: core/src/main/scala/org/apache/spark/internal/config/package.scala ## @@ -521,14 +521,14 @@ package object config {

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28123: URL: https://github.com/apache/spark/pull/28123#discussion_r418350938 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketsInJoin.scala ## @@ -0,0 +1,109 @@ +/* + * Licensed to the

[GitHub] [spark] maropu commented on a change in pull request #28420: [SPARK-31615][SQL] Pretty string output for sql method of RuntimeReplaceable expressions

2020-04-30 Thread GitBox
maropu commented on a change in pull request #28420: URL: https://github.com/apache/spark/pull/28420#discussion_r418346518 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ## @@ -323,6 +323,20 @@ trait RuntimeReplaceable

[GitHub] [spark] dongjoon-hyun commented on pull request #28426: [SPARK-31619][CORE] Rename config "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
dongjoon-hyun commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622175537 Thank you for pinging me, @jiangxb1987 . This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu edited a comment on pull request #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector

2020-04-30 Thread GitBox
maropu edited a comment on pull request #24421: URL: https://github.com/apache/spark/pull/24421#issuecomment-622172116 I'll close this based on the current status of the Jira. This is an automated message from the Apache Git

[GitHub] [spark] maropu edited a comment on pull request #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector

2020-04-30 Thread GitBox
maropu edited a comment on pull request #24421: URL: https://github.com/apache/spark/pull/24421#issuecomment-622172116 I'll close this based on the talk above. This is an automated message from the Apache Git Service. To

[GitHub] [spark] maropu commented on pull request #24421: [SPARK-12312][SQL]Support Kerberos login in JDBC connector

2020-04-30 Thread GitBox
maropu commented on pull request #24421: URL: https://github.com/apache/spark/pull/24421#issuecomment-622172116 I'll close this based on This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] AmplabJenkins removed a comment on pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #27617: URL: https://github.com/apache/spark/pull/27617#issuecomment-622170483 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] AmplabJenkins commented on pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #27617: URL: https://github.com/apache/spark/pull/27617#issuecomment-622170483 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] SparkQA removed a comment on pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-04-30 Thread GitBox
SparkQA removed a comment on pull request #27617: URL: https://github.com/apache/spark/pull/27617#issuecomment-622012562 **[Test build #122145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122145/testReport)** for PR 27617 at commit

[GitHub] [spark] SparkQA commented on pull request #27617: [SPARK-30865][SQL] Refactor DateTimeUtils

2020-04-30 Thread GitBox
SparkQA commented on pull request #27617: URL: https://github.com/apache/spark/pull/27617#issuecomment-622169274 **[Test build #122145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122145/testReport)** for PR 27617 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #28426: [SPARK-31619][CORE] Rename config name "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
AmplabJenkins commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622165554 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28426: [SPARK-31619][CORE] Rename config name "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.tim

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622165554 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] SparkQA commented on pull request #28426: [SPARK-31619][CORE] Rename config name "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
SparkQA commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622165096 **[Test build #122150 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122150/testReport)** for PR 28426 at commit

[GitHub] [spark] jiangxb1987 commented on pull request #28426: [SPARK-31619][CORE] Rename config name "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
jiangxb1987 commented on pull request #28426: URL: https://github.com/apache/spark/pull/28426#issuecomment-622163334 cc @vanzin @squito @tgravescs @skonto @dongjoon-hyun @gatorsmile @Ngone51 This is an automated message

[GitHub] [spark] jiangxb1987 opened a new pull request #28426: [SPARK-31619][CORE] Rename config name "spark.dynamicAllocation.shuffleTimeout" to "spark.dynamicAllocation.shuffleTracking.timeout"

2020-04-30 Thread GitBox
jiangxb1987 opened a new pull request #28426: URL: https://github.com/apache/spark/pull/28426 ### What changes were proposed in this pull request? The "spark.dynamicAllocation.shuffleTimeout" configuration only takes effect if "spark.dynamicAllocation.shuffleTracking.enabled" is true,

[GitHub] [spark] huaxingao commented on pull request #28417: [SPARK-31612][SQL][DOCS] SQL Reference clean up

2020-04-30 Thread GitBox
huaxingao commented on pull request #28417: URL: https://github.com/apache/spark/pull/28417#issuecomment-622140306 Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418313334 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -198,6 +198,7 @@ protected void

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
dongjoon-hyun commented on a change in pull request #28416: URL: https://github.com/apache/spark/pull/28416#discussion_r418310844 ## File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java ## @@ -198,6 +198,7 @@ protected void

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28416: [SPARK-31611][YARN] Register NettyMemoryMetrics into Node Manager's metrics system

2020-04-30 Thread GitBox
AmplabJenkins removed a comment on pull request #28416: URL: https://github.com/apache/spark/pull/28416#issuecomment-622134476 This is an automated message from the Apache Git Service. To respond to the message, please log on

<    1   2   3   4   5   6   7   >