[GitHub] [spark] SparkQA commented on pull request #33456: [SPARK-35815][SQL] Allow delayThreshold for watermark to be represented as ANSI interval literals

2021-07-21 Thread GitBox
SparkQA commented on pull request #33456: URL: https://github.com/apache/spark/pull/33456#issuecomment-884690800 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45987/ -- This

[GitHub] [spark] itholic commented on a change in pull request #33379: [SPARK-35810][SPARK-35807][PYTHON] Deprecate ps.broadcast API and the `num_files` argument

2021-07-21 Thread GitBox
itholic commented on a change in pull request #33379: URL: https://github.com/apache/spark/pull/33379#discussion_r674538556 ## File path: python/pyspark/pandas/namespace.py ## @@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame: """ Marks a DataFrame as s

[GitHub] [spark] itholic opened a new pull request #33479: [SPARK-35810][PYTHON][FOLLWUP] Deprecate ps.broadcast API and the num…

2021-07-21 Thread GitBox
itholic opened a new pull request #33479: URL: https://github.com/apache/spark/pull/33479 …_files argument ### What changes were proposed in this pull request? This PR follows up #33379 to fix build error in Sphinx ### Why are the changes needed? The Sphinx build is f

[GitHub] [spark] SparkQA commented on pull request #33461: [SPARK-36243][SQL][PYTHON][DOCS] Fixing pyspark tableExists issue with temporary views

2021-07-21 Thread GitBox
SparkQA commented on pull request #33461: URL: https://github.com/apache/spark/pull/33461#issuecomment-884689395 **[Test build #141474 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141474/testReport)** for PR 33461 at commit [`ffaae01`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #33450: [SPARK-35809][PYTHON] Add `index_col` argument for ps.sql

2021-07-21 Thread GitBox
SparkQA commented on pull request #33450: URL: https://github.com/apache/spark/pull/33450#issuecomment-884689269 **[Test build #141470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141470/testReport)** for PR 33450 at commit [`9bac117`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
SparkQA commented on pull request #33474: URL: https://github.com/apache/spark/pull/33474#issuecomment-884689325 **[Test build #141468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141468/testReport)** for PR 33474 at commit [`f8f8884`](https://github.co

[GitHub] [spark] gengliangwang commented on pull request #33478: [SPARK-36257][SQL] Updated the version of TimestampNTZ related changes as 3.3.0

2021-07-21 Thread GitBox
gengliangwang commented on pull request #33478: URL: https://github.com/apache/spark/pull/33478#issuecomment-884685770 cc @beliefer as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] gengliangwang opened a new pull request #33478: [SPARK-36257][SQL] Updated the version of TimestampNTZ related changes as 3.3.0

2021-07-21 Thread GitBox
gengliangwang opened a new pull request #33478: URL: https://github.com/apache/spark/pull/33478 ### What changes were proposed in this pull request? As we decided to release TimestampNTZ type in Spark 3.3, we should update the versions of TimestampNTZ related changes as 3.3.0

[GitHub] [spark] viirya edited a comment on pull request #33475: [SPARK-xxxxx][BUILD] Test with only SERIAL_SBT_TESTS and parallelism change

2021-07-21 Thread GitBox
viirya edited a comment on pull request #33475: URL: https://github.com/apache/spark/pull/33475#issuecomment-884684178 Seems memory setting change is needed. `Run TPC-DS queries with SF=1 ` and `sparkr ` already fail with `exit code 137`... -- This is an automated message from the Apache

[GitHub] [spark] viirya commented on pull request #33475: [SPARK-xxxxx][BUILD] Test with only SERIAL_SBT_TESTS and parallelism change

2021-07-21 Thread GitBox
viirya commented on pull request #33475: URL: https://github.com/apache/spark/pull/33475#issuecomment-884684178 Seems memory setting change is needed. `Run TPC-DS queries with SF=1 ` already fails with `exit code 137`... -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] otterc commented on pull request #33477: [SPARK-36255][SHUFFLE][CORE] Stop pushing and retrying on FileNotFound exceptions

2021-07-21 Thread GitBox
otterc commented on pull request #33477: URL: https://github.com/apache/spark/pull/33477#issuecomment-884683878 @Ngone51 @mridulm Please help review this bug fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] otterc opened a new pull request #33477: [SPARK-36255][SHUFFLE][CORE] Stop pushing and retrying on FileNotFound exceptions

2021-07-21 Thread GitBox
otterc opened a new pull request #33477: URL: https://github.com/apache/spark/pull/33477 ### What changes were proposed in this pull request? Once a job in a Spark application completes, the shuffle files are cleaned up by the executors. This is because the driver instructs the executors

[GitHub] [spark] sarutak opened a new pull request #33476: [SPARK-36256][BUILD] Upgrade lz4-java to 1.8.0

2021-07-21 Thread GitBox
sarutak opened a new pull request #33476: URL: https://github.com/apache/spark/pull/33476 ### What changes were proposed in this pull request? This PR upgrades `lz4-java` to `1.8.0`, which includes not only performance improvement but also Darwin aarch64 support. https://github.c

[GitHub] [spark] HyukjinKwon commented on pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
HyukjinKwon commented on pull request #33467: URL: https://github.com/apache/spark/pull/33467#issuecomment-884681263 Merged to master, branch-3.2 and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] SparkQA commented on pull request #33461: [SPARK-36243][SQL][PYTHON][DOCS] Fixing pyspark tableExists issue with temporary views

2021-07-21 Thread GitBox
SparkQA commented on pull request #33461: URL: https://github.com/apache/spark/pull/33461#issuecomment-884681107 **[Test build #141474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141474/testReport)** for PR 33461 at commit [`ffaae01`](https://github.com

[GitHub] [spark] HyukjinKwon closed pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
HyukjinKwon closed pull request #33467: URL: https://github.com/apache/spark/pull/33467 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884679953 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45985/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33363: [SPARK-36156][SQL] SCRIPT TRANSFORM ROW FORMAT DELIMITED should respect `NULL DEFINED AS` and default value should be `\N`

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33363: URL: https://github.com/apache/spark/pull/33363#issuecomment-884679790 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45983/

[GitHub] [spark] dominikgehl commented on a change in pull request #33461: [SPARK-36243][SQL][PYTHON][DOCS] Fixing pyspark tableExists issue with temporary views

2021-07-21 Thread GitBox
dominikgehl commented on a change in pull request #33461: URL: https://github.com/apache/spark/pull/33461#discussion_r674525636 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala ## @@ -381,6 +381,21 @@ class LocalTempViewTestSuite exte

[GitHub] [spark] SparkQA commented on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
SparkQA commented on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884679934 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45985/ -- This is an automated message from the A

[GitHub] [spark] AmplabJenkins commented on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884679953 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45985/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33363: [SPARK-36156][SQL] SCRIPT TRANSFORM ROW FORMAT DELIMITED should respect `NULL DEFINED AS` and default value should be `\N`

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33363: URL: https://github.com/apache/spark/pull/33363#issuecomment-884679790 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45983/ -- T

[GitHub] [spark] SparkQA commented on pull request #33363: [SPARK-36156][SQL] SCRIPT TRANSFORM ROW FORMAT DELIMITED should respect `NULL DEFINED AS` and default value should be `\N`

2021-07-21 Thread GitBox
SparkQA commented on pull request #33363: URL: https://github.com/apache/spark/pull/33363#issuecomment-884679769 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45983/ -- This is an automated message from the A

[GitHub] [spark] SparkQA commented on pull request #33461: [SPARK-36243][SQL][PYTHON][DOCS] Fixing pyspark tableExists issue with temporary views

2021-07-21 Thread GitBox
SparkQA commented on pull request #33461: URL: https://github.com/apache/spark/pull/33461#issuecomment-884679671 **[Test build #141473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141473/testReport)** for PR 33461 at commit [`0d03776`](https://github.com

[GitHub] [spark] viirya commented on a change in pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
viirya commented on a change in pull request #33447: URL: https://github.com/apache/spark/pull/33447#discussion_r674522297 ## File path: project/SparkBuild.scala ## @@ -1197,7 +1197,7 @@ object TestSettings { (Global / concurrentRestrictions) := { // The number of c

[GitHub] [spark] SparkQA commented on pull request #33475: [SPARK-xxxxx][BUILD] Test with only SERIAL_SBT_TESTS and parallelism change

2021-07-21 Thread GitBox
SparkQA commented on pull request #33475: URL: https://github.com/apache/spark/pull/33475#issuecomment-884676642 **[Test build #141472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141472/testReport)** for PR 33475 at commit [`2e93ced`](https://github.com

[GitHub] [spark] beliefer commented on pull request #33462: [SPARK-35088][SQL][FOLLOWUP] Add test case for TimestampNTZ sequence with default step

2021-07-21 Thread GitBox
beliefer commented on pull request #33462: URL: https://github.com/apache/spark/pull/33462#issuecomment-884676435 ping @gengliangwang @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] viirya commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
viirya commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884676248 Created #33475 with only SERIAL_SBT_TESTS and parallelism change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] viirya opened a new pull request #33475: [SPARK-xxxxx][BUILD] Test with only SERIAL_SBT_TESTS and parallelism change

2021-07-21 Thread GitBox
viirya opened a new pull request #33475: URL: https://github.com/apache/spark/pull/33475 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33467: URL: https://github.com/apache/spark/pull/33467#issuecomment-884675039 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141455/ -

[GitHub] [spark] SparkQA commented on pull request #33450: [SPARK-35809][PYTHON] Add `index_col` argument for ps.sql

2021-07-21 Thread GitBox
SparkQA commented on pull request #33450: URL: https://github.com/apache/spark/pull/33450#issuecomment-884675093 **[Test build #141470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141470/testReport)** for PR 33450 at commit [`9bac117`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #33437: [SPARK-36224][SQL] Use Void as the type name of NullType

2021-07-21 Thread GitBox
SparkQA commented on pull request #33437: URL: https://github.com/apache/spark/pull/33437#issuecomment-884675117 **[Test build #141471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141471/testReport)** for PR 33437 at commit [`db290d9`](https://github.com

[GitHub] [spark] SparkQA commented on pull request #33456: [SPARK-35815][SQL] Allow delayThreshold for watermark to be represented as ANSI interval literals

2021-07-21 Thread GitBox
SparkQA commented on pull request #33456: URL: https://github.com/apache/spark/pull/33456#issuecomment-884675063 **[Test build #141469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141469/testReport)** for PR 33456 at commit [`8cfbe5d`](https://github.com

[GitHub] [spark] AmplabJenkins commented on pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33467: URL: https://github.com/apache/spark/pull/33467#issuecomment-884675039 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141455/ -- This

[GitHub] [spark] SparkQA commented on pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
SparkQA commented on pull request #33474: URL: https://github.com/apache/spark/pull/33474#issuecomment-884675033 **[Test build #141468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141468/testReport)** for PR 33474 at commit [`f8f8884`](https://github.com

[GitHub] [spark] SparkQA removed a comment on pull request #33472: [SPARK-36251][INFRA][BUILD][3.2] Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33472: URL: https://github.com/apache/spark/pull/33472#issuecomment-884596100 **[Test build #141454 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141454/testReport)** for PR 33472 at commit [`5cd4cd5`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884673807 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45984/

[GitHub] [spark] SparkQA removed a comment on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33352: URL: https://github.com/apache/spark/pull/33352#issuecomment-884573121 **[Test build #141450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141450/testReport)** for PR 33352 at commit [`1746fa3`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33437: [SPARK-36224][SQL] Use Void as the type name of NullType

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33437: URL: https://github.com/apache/spark/pull/33437#issuecomment-884673812 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45982/

[GitHub] [spark] SparkQA removed a comment on pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33467: URL: https://github.com/apache/spark/pull/33467#issuecomment-884608518 **[Test build #141455 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141455/testReport)** for PR 33467 at commit [`9bf5e84`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33352: URL: https://github.com/apache/spark/pull/33352#issuecomment-884673806 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141450/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33472: [SPARK-36251][INFRA][BUILD][3.2] Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33472: URL: https://github.com/apache/spark/pull/33472#issuecomment-884673805 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141454/ -

[GitHub] [spark] AmplabJenkins commented on pull request #33472: [SPARK-36251][INFRA][BUILD][3.2] Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33472: URL: https://github.com/apache/spark/pull/33472#issuecomment-884673805 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141454/ -- This

[GitHub] [spark] SparkQA commented on pull request #33467: [SPARK-36246][CORE][TEST] GHA WorkerDecommissionExtended flake

2021-07-21 Thread GitBox
SparkQA commented on pull request #33467: URL: https://github.com/apache/spark/pull/33467#issuecomment-884673842 **[Test build #141455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141455/testReport)** for PR 33467 at commit [`9bf5e84`](https://github.co

[GitHub] [spark] AmplabJenkins commented on pull request #33437: [SPARK-36224][SQL] Use Void as the type name of NullType

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33437: URL: https://github.com/apache/spark/pull/33437#issuecomment-884673812 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45982/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884673807 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45984/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33352: URL: https://github.com/apache/spark/pull/33352#issuecomment-884673806 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141450/ -- This

[GitHub] [spark] SparkQA commented on pull request #33352: [SPARK-34952][SQL] DSv2 Aggregate push down APIs

2021-07-21 Thread GitBox
SparkQA commented on pull request #33352: URL: https://github.com/apache/spark/pull/33352#issuecomment-884672397 **[Test build #141450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141450/testReport)** for PR 33352 at commit [`1746fa3`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #33437: [SPARK-36224][SQL] Use Void as the type name of NullType

2021-07-21 Thread GitBox
SparkQA commented on pull request #33437: URL: https://github.com/apache/spark/pull/33437#issuecomment-884671956 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45982/ -- This is an automated message from the A

[GitHub] [spark] viirya commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
viirya commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884671310 Sure. Will create a new one and see the result. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] ueshin commented on a change in pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
ueshin commented on a change in pull request #33474: URL: https://github.com/apache/spark/pull/33474#discussion_r674514929 ## File path: python/pyspark/pandas/categorical.py ## @@ -348,8 +348,96 @@ def as_unordered(self, inplace: bool = False) -> Optional["ps.Series"]:

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
HyukjinKwon commented on a change in pull request #33474: URL: https://github.com/apache/spark/pull/33474#discussion_r674514629 ## File path: python/pyspark/pandas/categorical.py ## @@ -348,8 +348,96 @@ def as_unordered(self, inplace: bool = False) -> Optional["ps.Series"]:

[GitHub] [spark] SparkQA commented on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
SparkQA commented on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884668672 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45985/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
SparkQA commented on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884668382 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45984/ -- This

[GitHub] [spark] ueshin commented on pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
ueshin commented on pull request #33474: URL: https://github.com/apache/spark/pull/33474#issuecomment-884667942 cc @HyukjinKwon @itholic @xinrong-databricks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] ueshin opened a new pull request #33474: [SPARK-36249][PYTHON] Add remove_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
ueshin opened a new pull request #33474: URL: https://github.com/apache/spark/pull/33474 ### What changes were proposed in this pull request? Add `remove_categories` to `CategoricalAccessor` and `CategoricalIndex`. ### Why are the changes needed? We should implement `rem

[GitHub] [spark] gengliangwang commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
gengliangwang commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884667440 > Are everything required? For me, --parallelism 1 and SERIAL_SBT_TESTS=1 looks like the most significant ones. +1 with @dongjoon-hyun. Shall we create a new PR with

[GitHub] [spark] ueshin closed pull request #33470: [SPARK-36214][PYTHON] Add add_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
ueshin closed pull request #33470: URL: https://github.com/apache/spark/pull/33470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubsc

[GitHub] [spark] ueshin commented on pull request #33470: [SPARK-36214][PYTHON] Add add_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
ueshin commented on pull request #33470: URL: https://github.com/apache/spark/pull/33470#issuecomment-884666157 Thanks! merging to master/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
dongjoon-hyun commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884665799 BTW, the optimized (minimized) values looks good to me. I'm just wondering if there is something which is effectively not required here. -- This is an automated message

[GitHub] [spark] itholic commented on pull request #33450: [SPARK-35809][PYTHON] Add `index_col` argument for ps.sql

2021-07-21 Thread GitBox
itholic commented on pull request #33450: URL: https://github.com/apache/spark/pull/33450#issuecomment-884665377 Let me fix the doctest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
dongjoon-hyun commented on a change in pull request #33447: URL: https://github.com/apache/spark/pull/33447#discussion_r674509888 ## File path: project/SparkBuild.scala ## @@ -1120,9 +1120,9 @@ object TestSettings { .map { case (k,v) => s"-D$k=$v" }.toSeq, (Test / j

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
dongjoon-hyun commented on a change in pull request #33447: URL: https://github.com/apache/spark/pull/33447#discussion_r674509759 ## File path: pom.xml ## @@ -2713,7 +2713,7 @@ ${project.build.directory}/surefire-reports . SparkTestSuite.

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
dongjoon-hyun commented on a change in pull request #33447: URL: https://github.com/apache/spark/pull/33447#discussion_r674509703 ## File path: pom.xml ## @@ -2662,7 +2662,7 @@ **/*Suite.java ${project.build.directory}/surefire-reports

[GitHub] [spark] SparkQA commented on pull request #33472: [SPARK-36251][INFRA][BUILD][3.2] Cover GitHub Actions runs without SHA in testing script

2021-07-21 Thread GitBox
SparkQA commented on pull request #33472: URL: https://github.com/apache/spark/pull/33472#issuecomment-884664996 **[Test build #141454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141454/testReport)** for PR 33472 at commit [`5cd4cd5`](https://github.co

[GitHub] [spark] SparkQA commented on pull request #33363: [SPARK-36156][SQL] SCRIPT TRANSFORM ROW FORMAT DELIMITED should respect `NULL DEFINED AS` and default value should be `\N`

2021-07-21 Thread GitBox
SparkQA commented on pull request #33363: URL: https://github.com/apache/spark/pull/33363#issuecomment-884664867 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45983/ -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on pull request #33460: [SPARK-36242][CORE] Ensure spill file closed before set `success = true` in `ExternalSorter.spillMemoryIteratorToDisk` method

2021-07-21 Thread GitBox
LuciferYang commented on pull request #33460: URL: https://github.com/apache/spark/pull/33460#issuecomment-884663892 > Mock blockManager.getDiskWriter ? See UnsafeShuffleWriterSuite for an example Thx ~ -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] wangyum commented on a change in pull request #33465: [SPARK-36245][SQL] Deduplicate the right side of left semi/anti join

2021-07-21 Thread GitBox
wangyum commented on a change in pull request #33465: URL: https://github.com/apache/spark/pull/33465#discussion_r674508245 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DeduplicateRightSideOfLeftSemiAntiJoin.scala ## @@ -0,0 +1,37 @@ +/* +

[GitHub] [spark] HyukjinKwon closed pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
HyukjinKwon closed pull request #33473: URL: https://github.com/apache/spark/pull/33473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-un

[GitHub] [spark] HyukjinKwon commented on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
HyukjinKwon commented on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884662877 Merged to master and branch-3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33465: [SPARK-36245][SQL] Deduplicate the right side of left semi/anti join

2021-07-21 Thread GitBox
HyukjinKwon commented on a change in pull request #33465: URL: https://github.com/apache/spark/pull/33465#discussion_r674506804 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DeduplicateRightSideOfLeftSemiAntiJoin.scala ## @@ -0,0 +1,37 @@ +/

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33465: [SPARK-36245][SQL] Deduplicate the right side of left semi/anti join

2021-07-21 Thread GitBox
HyukjinKwon commented on a change in pull request #33465: URL: https://github.com/apache/spark/pull/33465#discussion_r674506621 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DeduplicateRightSideOfLeftSemiAntiJoin.scala ## @@ -0,0 +1,37 @@ +/

[GitHub] [spark] gengliangwang commented on a change in pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
gengliangwang commented on a change in pull request #33447: URL: https://github.com/apache/spark/pull/33447#discussion_r674506580 ## File path: project/SparkBuild.scala ## @@ -1197,7 +1197,7 @@ object TestSettings { (Global / concurrentRestrictions) := { // The numb

[GitHub] [spark] MaxGekk commented on a change in pull request #33456: [SPARK-35815][SQL] Allow delayThreshold for watermark to be represented as ANSI interval literals

2021-07-21 Thread GitBox
MaxGekk commented on a change in pull request #33456: URL: https://github.com/apache/spark/pull/33456#discussion_r674505943 ## File path: common/unsafe/src/main/java/org/apache/spark/sql/catalyst/util/DateTimeConstants.java ## @@ -21,6 +21,8 @@ public static final int MON

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33458: [SPARK-36239][PYTHON][DOCS] Remove some APIs from documentation.

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33458: URL: https://github.com/apache/spark/pull/33458#issuecomment-884660129 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45979/

[GitHub] [spark] SparkQA commented on pull request #33437: [SPARK-36224][SQL] Use Void as the type name of NullType

2021-07-21 Thread GitBox
SparkQA commented on pull request #33437: URL: https://github.com/apache/spark/pull/33437#issuecomment-884660291 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45982/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins commented on pull request #33458: [SPARK-36239][PYTHON][DOCS] Remove some APIs from documentation.

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33458: URL: https://github.com/apache/spark/pull/33458#issuecomment-884660129 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45979/ -- T

[GitHub] [spark] SparkQA commented on pull request #33458: [SPARK-36239][PYTHON][DOCS] Remove some APIs from documentation.

2021-07-21 Thread GitBox
SparkQA commented on pull request #33458: URL: https://github.com/apache/spark/pull/33458#issuecomment-884660113 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45979/ -- This is an automated message from the A

[GitHub] [spark] viirya commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
viirya commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884659563 Looks like `sql - other tests` was failed due to `org.apache.spark.sql.execution.metric.SQLMetricsSuite`, not memory issue. -- This is an automated message from the Apache Git

[GitHub] [spark] gengliangwang commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
gengliangwang commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884659444 @viirya Thank you, this is great! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] viirya commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
viirya commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884659011 Yea, seems so. :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] HyukjinKwon commented on pull request #33456: [SPARK-35815][SQL] Allow delayThreshold for watermark to be represented as ANSI interval literals

2021-07-21 Thread GitBox
HyukjinKwon commented on pull request #33456: URL: https://github.com/apache/spark/pull/33456#issuecomment-884658072 cc @HeartSaVioR and @gengliangwang FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
HyukjinKwon commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884657570 Looks like we're almost there! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] SparkQA removed a comment on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884645776 **[Test build #141467 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141467/testReport)** for PR 33473 at commit [`8697697`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884656494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] SparkQA removed a comment on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884642865 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884656501 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141467/ -

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33450: [SPARK-35809][PYTHON] Add `index_col` argument for ps.sql

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33450: URL: https://github.com/apache/spark/pull/33450#issuecomment-884656495 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45980/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33471: [SPARK-36248][PYTHON] Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33471: URL: https://github.com/apache/spark/pull/33471#issuecomment-884656499 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45976/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33443: [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33443: URL: https://github.com/apache/spark/pull/33443#issuecomment-884656498 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141448/ -

[GitHub] [spark] SparkQA removed a comment on pull request #33443: [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations

2021-07-21 Thread GitBox
SparkQA removed a comment on pull request #33443: URL: https://github.com/apache/spark/pull/33443#issuecomment-884552566 **[Test build #141448 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/141448/testReport)** for PR 33443 at commit [`0231adf`](https://gi

[GitHub] [spark] AmplabJenkins removed a comment on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
AmplabJenkins removed a comment on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884656496 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45981/

[GitHub] [spark] AmplabJenkins commented on pull request #33443: [SPARK-35848][MLLIB] Optimize some treeAggregates in MLlib by delaying allocations

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33443: URL: https://github.com/apache/spark/pull/33443#issuecomment-884656498 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141448/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33473: [SPARK-36253][PYTHON][DOCS] Add versionadded to the top of pandas-on-Spark package

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33473: URL: https://github.com/apache/spark/pull/33473#issuecomment-884656501 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/141467/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #33471: [SPARK-36248][PYTHON] Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33471: URL: https://github.com/apache/spark/pull/33471#issuecomment-884656499 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45976/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33466: [SPARK-36143][PYTHON] Adjust `astype` of Series with missing values to follow pandas

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33466: URL: https://github.com/apache/spark/pull/33466#issuecomment-884656497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] AmplabJenkins commented on pull request #33450: [SPARK-35809][PYTHON] Add `index_col` argument for ps.sql

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33450: URL: https://github.com/apache/spark/pull/33450#issuecomment-884656495 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45980/ -- T

[GitHub] [spark] AmplabJenkins commented on pull request #33447: [SPARK-xxxxx][BUILD] Change memory settings for enabling GA

2021-07-21 Thread GitBox
AmplabJenkins commented on pull request #33447: URL: https://github.com/apache/spark/pull/33447#issuecomment-884656496 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/45981/ -- T

[GitHub] [spark] HyukjinKwon commented on a change in pull request #33379: [SPARK-35810][SPARK-35807][PYTHON] Deprecate ps.broadcast API and the `num_files` argument

2021-07-21 Thread GitBox
HyukjinKwon commented on a change in pull request #33379: URL: https://github.com/apache/spark/pull/33379#discussion_r674501175 ## File path: python/pyspark/pandas/namespace.py ## @@ -2822,6 +2823,8 @@ def broadcast(obj: DataFrame) -> DataFrame: """ Marks a DataFrame

[GitHub] [spark] SparkQA commented on pull request #33471: [SPARK-36248][PYTHON] Add rename_categories to CategoricalAccessor and CategoricalIndex

2021-07-21 Thread GitBox
SparkQA commented on pull request #33471: URL: https://github.com/apache/spark/pull/33471#issuecomment-884656209 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45976/ -- This is an automated message from the A

  1   2   3   4   5   6   7   8   9   10   >