[GitHub] [spark] cloud-fan closed pull request #34758: [SPARK-37501][SQL] CREATE/REPLACE TABLE should qualify location for v2 command

2021-12-01 Thread GitBox
cloud-fan closed pull request #34758: URL: https://github.com/apache/spark/pull/34758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA commented on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
SparkQA commented on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983709555 **[Test build #145816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145816/testReport)** for PR 34599 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983450419 **[Test build #145810 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145810/testReport)** for PR 34741 at commit

[GitHub] [spark] tgravescs commented on a change in pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
tgravescs commented on a change in pull request #34767: URL: https://github.com/apache/spark/pull/34767#discussion_r760269375 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ## @@ -910,7 +913,7 @@ private[spark] class Client(

[GitHub] [spark] cloud-fan commented on pull request #34718: [SPARK-37460][DOCS] Add the description of ALTER DATABASE SET LOCATION

2021-12-01 Thread GitBox
cloud-fan commented on pull request #34718: URL: https://github.com/apache/spark/pull/34718#issuecomment-983583155 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
beliefer commented on pull request #34769: URL: https://github.com/apache/spark/pull/34769#issuecomment-983586891 Because my mistake rebase not correctly, I create this PR to replace https://github.com/apache/spark/pull/34712 -- This is an automated message from the Apache Git Service.

[GitHub] [spark] beliefer edited a comment on pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
beliefer edited a comment on pull request #34769: URL: https://github.com/apache/spark/pull/34769#issuecomment-983586891 Because my mistake rebase not correctly, I create this PR to replace https://github.com/apache/spark/pull/34712 ping @cloud-fan @bersprockets -- This is an

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34738: URL: https://github.com/apache/spark/pull/34738#discussion_r760139088 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -149,6 +149,7 @@ case class RowDataSourceScanExec(

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34738: URL: https://github.com/apache/spark/pull/34738#discussion_r760144783 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -255,7 +256,20 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34738: URL: https://github.com/apache/spark/pull/34738#discussion_r760144434 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -249,13 +248,25 @@ object

[GitHub] [spark] SparkQA commented on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
SparkQA commented on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983626546 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50287/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins commented on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983639466 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50287/ --

[GitHub] [spark] AmplabJenkins commented on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-983639467 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145806/ -- This

[GitHub] [spark] tanelk commented on a change in pull request #34702: [SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted

2021-12-01 Thread GitBox
tanelk commented on a change in pull request #34702: URL: https://github.com/apache/spark/pull/34702#discussion_r760180464 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-983639467 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145806/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983639466 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50287/

[GitHub] [spark] kevincmchen commented on pull request #34742: [SPARK-37486][SQL][HIVE] set the ContextClassLoader before using the `addJars` in `HiveClient`

2021-12-01 Thread GitBox
kevincmchen commented on pull request #34742: URL: https://github.com/apache/spark/pull/34742#issuecomment-983645625 > Thank you for your explanation. If you are sure it will not affect the old case, I think this change is OK ok, thx -- This is an automated message from the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983655989 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50288/

[GitHub] [spark] cloud-fan edited a comment on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
cloud-fan edited a comment on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983660633 @bersprockets After reading more ORC code, I feel the timestamp implementation is quite messy in ORC. Not only the reader side, but also the writer side shifts the

[GitHub] [spark] cloud-fan commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r760221164 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -442,17 +442,22 @@ object DateTimeUtils {

[GitHub] [spark] cloud-fan commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r760221751 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala ## @@ -66,10 +68,23 @@ sealed trait

[GitHub] [spark] tgravescs commented on pull request #34672: [SPARK-37394][CORE] Skip registering with ESS if a customized shuffle manager is configured

2021-12-01 Thread GitBox
tgravescs commented on pull request #34672: URL: https://github.com/apache/spark/pull/34672#issuecomment-983734969 yeah I understand there are a bunch of people that override it, including myself, but its not really a public pluggable API which it should be so that users don't have to

[GitHub] [spark] sarutak opened a new pull request #34771: [SPARK-37326][SQL][FOLLOWUP] Fix the test for Java 11

2021-12-01 Thread GitBox
sarutak opened a new pull request #34771: URL: https://github.com/apache/spark/pull/34771 ### What changes were proposed in this pull request? This PR fixes an issue that the test added in SPARK-37326 (#34596) fails with Java 11.

[GitHub] [spark] SparkQA commented on pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-01 Thread GitBox
SparkQA commented on pull request #34738: URL: https://github.com/apache/spark/pull/34738#issuecomment-983793874 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50292/ -- This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
SparkQA commented on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983793703 **[Test build #145813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145813/testReport)** for PR 34712 at commit

[GitHub] [spark] SparkQA commented on pull request #34758: [SPARK-37501][SQL] CREATE/REPLACE TABLE should qualify location for v2 command

2021-12-01 Thread GitBox
SparkQA commented on pull request #34758: URL: https://github.com/apache/spark/pull/34758#issuecomment-983593261 **[Test build #145802 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145802/testReport)** for PR 34758 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #34768: [SPARK-11150][SQL][FOLLOWUP] We should drop all tables after testing dynamic partition pruning.

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34768: URL: https://github.com/apache/spark/pull/34768#issuecomment-983592651 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] SparkQA commented on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-01 Thread GitBox
SparkQA commented on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-983597809 **[Test build #145801 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145801/testReport)** for PR 34673 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-983363812 **[Test build #145801 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145801/testReport)** for PR 34673 at commit

[GitHub] [spark] SparkQA commented on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
SparkQA commented on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983605763 **[Test build #145815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145815/testReport)** for PR 34741 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34758: [SPARK-37501][SQL] CREATE/REPLACE TABLE should qualify location for v2 command

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34758: URL: https://github.com/apache/spark/pull/34758#issuecomment-983652224 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145804/

[GitHub] [spark] AmplabJenkins commented on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983709929 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145816/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983709929 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145816/

[GitHub] [spark] SparkQA removed a comment on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983695873 **[Test build #145816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145816/testReport)** for PR 34599 at commit

[GitHub] [spark] SparkQA commented on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
SparkQA commented on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983713248 **[Test build #145810 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145810/testReport)** for PR 34741 at commit

[GitHub] [spark] SparkQA commented on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
SparkQA commented on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983742260 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50291/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA removed a comment on pull request #34712: [SPARK-37463][SQL] Read/Write Timestamp ntz or ltz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #34712: URL: https://github.com/apache/spark/pull/34712#issuecomment-983548980 **[Test build #145813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145813/testReport)** for PR 34712 at commit

[GitHub] [spark] SparkQA commented on pull request #34770: [SPARK-37480][K8S][DOC][3.2] Sync Kubernetes configuration to latest in running-on-k8s.md

2021-12-01 Thread GitBox
SparkQA commented on pull request #34770: URL: https://github.com/apache/spark/pull/34770#issuecomment-983802883 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50293/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #34599: [SPARK-37331][K8S] Add the ability to create resources before driverPod creating

2021-12-01 Thread GitBox
SparkQA commented on pull request #34599: URL: https://github.com/apache/spark/pull/34599#issuecomment-983803447 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50291/ -- This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #34753: [SPARK-37494][SQL] Unify v1 and v2 options output of `SHOW CREATE TABLE` command

2021-12-01 Thread GitBox
SparkQA commented on pull request #34753: URL: https://github.com/apache/spark/pull/34753#issuecomment-983562325 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50283/ -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on a change in pull request #34667: [SPARK-36902][SQL] Migrate CreateTableAsSelectStatement to v2 command

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34667: URL: https://github.com/apache/spark/pull/34667#discussion_r760132755 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ## @@ -323,18 +323,25 @@ final class DataFrameWriter[T]

[GitHub] [spark] AmplabJenkins commented on pull request #34673: [SPARK-37343][SQL] Implement createIndex, IndexExists and dropIndex in JDBC (Postgres dialect)

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34673: URL: https://github.com/apache/spark/pull/34673#issuecomment-983599243 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145801/ -- This

[GitHub] [spark] cloud-fan commented on a change in pull request #34738: [SPARK-37483][SQL] Support push down top N to JDBC data source V2

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34738: URL: https://github.com/apache/spark/pull/34738#discussion_r760142801 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/PushedDownOperators.scala ## @@ -25,4 +26,5 @@ import

[GitHub] [spark] zero323 commented on a change in pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2021-12-01 Thread GitBox
zero323 commented on a change in pull request #34363: URL: https://github.com/apache/spark/pull/34363#discussion_r760175944 ## File path: python/pyspark/accumulators.py ## @@ -176,44 +193,44 @@ class AccumulatorParam(object): [7.0, 8.0, 9.0] """ -def zero(self,

[GitHub] [spark] zero323 commented on a change in pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2021-12-01 Thread GitBox
zero323 commented on a change in pull request #34363: URL: https://github.com/apache/spark/pull/34363#discussion_r760186335 ## File path: python/pyspark/_typing.pyi ## @@ -21,11 +21,14 @@ from typing_extensions import Protocol F = TypeVar("F", bound=Callable) T_co =

[GitHub] [spark] SparkQA removed a comment on pull request #34758: [SPARK-37501][SQL] CREATE/REPLACE TABLE should qualify location for v2 command

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #34758: URL: https://github.com/apache/spark/pull/34758#issuecomment-983403373 **[Test build #145804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145804/testReport)** for PR 34758 at commit

[GitHub] [spark] SparkQA commented on pull request #34758: [SPARK-37501][SQL] CREATE/REPLACE TABLE should qualify location for v2 command

2021-12-01 Thread GitBox
SparkQA commented on pull request #34758: URL: https://github.com/apache/spark/pull/34758#issuecomment-983650601 **[Test build #145804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145804/testReport)** for PR 34758 at commit

[GitHub] [spark] kevincmchen commented on pull request #34742: [SPARK-37486][SQL][HIVE] set the ContextClassLoader before using the `addJars` in `HiveClient`

2021-12-01 Thread GitBox
kevincmchen commented on pull request #34742: URL: https://github.com/apache/spark/pull/34742#issuecomment-983650416 @HyukjinKwon Could you please review the code and merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cloud-fan edited a comment on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
cloud-fan edited a comment on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983660633 @bersprockets After reading more ORC code, I feel the timestamp implementation is quite messy in ORC. Not only the reader side, but also the writer side shifts the

[GitHub] [spark] zero323 commented on a change in pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2021-12-01 Thread GitBox
zero323 commented on a change in pull request #34363: URL: https://github.com/apache/spark/pull/34363#discussion_r760208707 ## File path: python/pyspark/accumulators.py ## @@ -176,44 +193,44 @@ class AccumulatorParam(object): [7.0, 8.0, 9.0] """ -def zero(self,

[GitHub] [spark] SparkQA commented on pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
SparkQA commented on pull request #34769: URL: https://github.com/apache/spark/pull/34769#issuecomment-983665823 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50289/ -- This is an automated message from the

[GitHub] [spark] SparkQA commented on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
SparkQA commented on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983680629 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50290/ -- This is an automated message from the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r760219127 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala ## @@ -164,6 +164,10 @@ class CSVOptions(

[GitHub] [spark] cloud-fan commented on a change in pull request #34596: [SPARK-37326][SQL] Support TimestampNTZ in CSV data source

2021-12-01 Thread GitBox
cloud-fan commented on a change in pull request #34596: URL: https://github.com/apache/spark/pull/34596#discussion_r760228758 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala ## @@ -2489,10 +2693,6 @@ abstract class CSVSuite

[GitHub] [spark] AmplabJenkins commented on pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34769: URL: https://github.com/apache/spark/pull/34769#issuecomment-983693572 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50289/ --

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34769: [SPARK-37463][SQL] Read/Write Timestamp ntz to Orc uses UTC timestamp

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34769: URL: https://github.com/apache/spark/pull/34769#issuecomment-983693572 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50289/

[GitHub] [spark] Yikun opened a new pull request #34770: [SPARK-37480][K8S][DOC][3.2] Sync Kubernetes configuration to latest in running-on-k8s.md

2021-12-01 Thread GitBox
Yikun opened a new pull request #34770: URL: https://github.com/apache/spark/pull/34770 ### What changes were proposed in this pull request? Sync Kubernetes configurations to 3.2.0 in doc ### Why are the changes needed? Configurations in docs/running-on-kubernetes.md are

[GitHub] [spark] AmplabJenkins commented on pull request #34741: [SPARK-37463][SQL] Read/Write Timestamp ntz from/to Orc uses UTC time zone

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34741: URL: https://github.com/apache/spark/pull/34741#issuecomment-983751502 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] SparkQA commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
SparkQA commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984022729 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50298/ -- This is an automated message from the

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984027464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50298/

[GitHub] [spark] AmplabJenkins commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984027464 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50298/ --

[GitHub] [spark] AmplabJenkins commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984031566 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145823/ -- This

[GitHub] [spark] SparkQA commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
SparkQA commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984031343 **[Test build #145823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145823/testReport)** for PR 32875 at commit

[GitHub] [spark] SparkQA removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
SparkQA removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-983948021 **[Test build #145824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145824/testReport)** for PR 32875 at commit

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984040123 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145824/

[GitHub] [spark] SparkQA commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
SparkQA commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984039881 **[Test build #145824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145824/testReport)** for PR 32875 at commit

[GitHub] [spark] AmplabJenkins commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984040123 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145824/ -- This

[GitHub] [spark] AmplabJenkins commented on pull request #34771: [SPARK-37326][SQL][FOLLOWUP] Fix the test for Java 11

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34771: URL: https://github.com/apache/spark/pull/34771#issuecomment-984068952 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145819/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34771: [SPARK-37326][SQL][FOLLOWUP] Fix the test for Java 11

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34771: URL: https://github.com/apache/spark/pull/34771#issuecomment-984068952 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145819/

[GitHub] [spark] sathiyapk commented on a change in pull request #34729: [SPARK-37475][SQL] Add scale parameter to floor and ceil functions

2021-12-01 Thread GitBox
sathiyapk commented on a change in pull request #34729: URL: https://github.com/apache/spark/pull/34729#discussion_r760536604 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala ## @@ -243,40 +243,26 @@ case class

[GitHub] [spark] zero323 commented on pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2021-12-01 Thread GitBox
zero323 commented on pull request #34363: URL: https://github.com/apache/spark/pull/34363#issuecomment-984127373 Unrelated to this specific PR ‒ we all use force pushes from time to time, especially when linear history is preferred (and we `rebase` to the `HEAD` of the main branch), but

[GitHub] [spark] ueshin commented on a change in pull request #34509: [SPARK-34521][PYTHON][SQL] Fix spark.createDataFrame when using pandas with StringDtype

2021-12-01 Thread GitBox
ueshin commented on a change in pull request #34509: URL: https://github.com/apache/spark/pull/34509#discussion_r760657918 ## File path: python/pyspark/sql/pandas/serializers.py ## @@ -169,6 +169,8 @@ def create_array(s, t): elif is_categorical_dtype(s.dtype):

[GitHub] [spark] SparkQA commented on pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
SparkQA commented on pull request #34767: URL: https://github.com/apache/spark/pull/34767#issuecomment-984168413 **[Test build #145827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145827/testReport)** for PR 34767 at commit

[GitHub] [spark] ueshin opened a new pull request #34772: [SPARK-37514][PYTHON] Remove workarounds due to older pandas

2021-12-01 Thread GitBox
ueshin opened a new pull request #34772: URL: https://github.com/apache/spark/pull/34772 ### What changes were proposed in this pull request? Removes workarounds due to older pandas. ### Why are the changes needed? Now that we upgraded the minimum version of pandas to

[GitHub] [spark] SparkQA commented on pull request #34701: [SPARK-37450][SQL] Prune unnecessary fields from Generate

2021-12-01 Thread GitBox
SparkQA commented on pull request #34701: URL: https://github.com/apache/spark/pull/34701#issuecomment-984173819 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50301/ -- This is an automated message from the Apache

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760669979 ## File path: python/pyspark/sql/session.py ## @@ -304,8 +304,11 @@ def __init__( and not

[GitHub] [spark] HyukjinKwon commented on a change in pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox
HyukjinKwon commented on a change in pull request #34757: URL: https://github.com/apache/spark/pull/34757#discussion_r760670200 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -1074,6 +1058,28 @@ object SparkSession extends Logging {

[GitHub] [spark] AmplabJenkins commented on pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34767: URL: https://github.com/apache/spark/pull/34767#issuecomment-984180170 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145827/ -- This

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34767: URL: https://github.com/apache/spark/pull/34767#issuecomment-984180170 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145827/

[GitHub] [spark] SparkQA commented on pull request #34772: [SPARK-37514][PYTHON] Remove workarounds due to older pandas

2021-12-01 Thread GitBox
SparkQA commented on pull request #34772: URL: https://github.com/apache/spark/pull/34772#issuecomment-984205285 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50304/ -- This is an automated message from the Apache

[GitHub] [spark] SparkQA commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox
SparkQA commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984205583 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50305/ -- This is an automated message from the Apache

[GitHub] [spark] AmplabJenkins removed a comment on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #32875: URL: https://github.com/apache/spark/pull/32875#issuecomment-984028912 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50299/

[GitHub] [spark] AmplabJenkins removed a comment on pull request #34754: [SPARK-37496][SQL] Migrate ReplaceTableAsSelectStatement to v2 command

2021-12-01 Thread GitBox
AmplabJenkins removed a comment on pull request #34754: URL: https://github.com/apache/spark/pull/34754#issuecomment-984077421 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145820/

[GitHub] [spark] sunchao commented on a change in pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on a change in pull request #34659: URL: https://github.com/apache/spark/pull/34659#discussion_r760636401 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -227,30 +230,340 @@ private

[GitHub] [spark] c21 commented on a change in pull request #34702: [SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted

2021-12-01 Thread GitBox
c21 commented on a change in pull request #34702: URL: https://github.com/apache/spark/pull/34702#discussion_r760644578 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

[GitHub] [spark] nicolasazrak commented on a change in pull request #34509: [SPARK-34521][PYTHON][SQL] Fix spark.createDataFrame when using pandas with StringDtype

2021-12-01 Thread GitBox
nicolasazrak commented on a change in pull request #34509: URL: https://github.com/apache/spark/pull/34509#discussion_r760648330 ## File path: python/pyspark/sql/pandas/serializers.py ## @@ -169,6 +169,8 @@ def create_array(s, t): elif

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
AngersZh commented on a change in pull request #34767: URL: https://github.com/apache/spark/pull/34767#discussion_r760653861 ## File path: resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala ## @@ -74,7 +74,7 @@

[GitHub] [spark] sunchao commented on a change in pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on a change in pull request #34659: URL: https://github.com/apache/spark/pull/34659#discussion_r760653817 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -227,30 +230,340 @@ private

[GitHub] [spark] github-actions[bot] closed pull request #33706: [SPARK-36477][SQL] Inferring schema from JSON file shall handle CharConversionException/MalformedInputException

2021-12-01 Thread GitBox
github-actions[bot] closed pull request #33706: URL: https://github.com/apache/spark/pull/33706 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] github-actions[bot] closed pull request #33740: [SPARK-36510][DOCS] Add spark.redaction.string.regex property to the docs

2021-12-01 Thread GitBox
github-actions[bot] closed pull request #33740: URL: https://github.com/apache/spark/pull/33740 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] SparkQA commented on pull request #34757: [SPARK-37504][PYTHON] Pyspark create SparkSession with existed session should not pass static conf

2021-12-01 Thread GitBox
SparkQA commented on pull request #34757: URL: https://github.com/apache/spark/pull/34757#issuecomment-984177089 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50302/ -- This is an automated message from the Apache

[GitHub] [spark] sunchao commented on pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on pull request #34659: URL: https://github.com/apache/spark/pull/34659#issuecomment-984184680 Thanks @sadikovi and @agrawaldevesh ! I've addressed most of your comments, while some left to be answered. Please let me know what you think. > On a more style-like

[GitHub] [spark] SparkQA commented on pull request #34767: [SPARK-37461][YARN][FOLLOWUP] Refactor YARN Client code to avoid add unnecessary parameter of `appId`

2021-12-01 Thread GitBox
SparkQA commented on pull request #34767: URL: https://github.com/apache/spark/pull/34767#issuecomment-984206473 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50303/ -- This is an automated message from the

[GitHub] [spark] sunchao commented on a change in pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on a change in pull request #34659: URL: https://github.com/apache/spark/pull/34659#discussion_r760568934 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java ## @@ -176,8 +176,7 @@ public void

[GitHub] [spark] AmplabJenkins commented on pull request #34754: [SPARK-37496][SQL] Migrate ReplaceTableAsSelectStatement to v2 command

2021-12-01 Thread GitBox
AmplabJenkins commented on pull request #34754: URL: https://github.com/apache/spark/pull/34754#issuecomment-984077421 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145820/ -- This

[GitHub] [spark] sathiyapk commented on a change in pull request #34729: [SPARK-37475][SQL] Add scale parameter to floor and ceil functions

2021-12-01 Thread GitBox
sathiyapk commented on a change in pull request #34729: URL: https://github.com/apache/spark/pull/34729#discussion_r760536604 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala ## @@ -243,40 +243,26 @@ case class

[GitHub] [spark] zero323 commented on a change in pull request #34363: [SPARK-37083][PYTHON] Inline type hints for python/pyspark/accumulators.py

2021-12-01 Thread GitBox
zero323 commented on a change in pull request #34363: URL: https://github.com/apache/spark/pull/34363#discussion_r760622451 ## File path: python/pyspark/accumulators.py ## @@ -176,44 +193,44 @@ class AccumulatorParam(object): [7.0, 8.0, 9.0] """ -def zero(self,

[GitHub] [spark] sunchao commented on a change in pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on a change in pull request #34659: URL: https://github.com/apache/spark/pull/34659#discussion_r760632213 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -227,30 +230,340 @@ private

[GitHub] [spark] sunchao commented on a change in pull request #34659: [SPARK-34863][SQL] Support complex types for Parquet vectorized reader

2021-12-01 Thread GitBox
sunchao commented on a change in pull request #34659: URL: https://github.com/apache/spark/pull/34659#discussion_r760635579 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java ## @@ -227,30 +230,340 @@ private

[GitHub] [spark] c21 commented on a change in pull request #34702: [SPARK-37455][SQL] Replace hash with sort aggregate if child is already sorted

2021-12-01 Thread GitBox
c21 commented on a change in pull request #34702: URL: https://github.com/apache/spark/pull/34702#discussion_r760643746 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ReplaceHashWithSortAgg.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache

<    1   2   3   4   5   6   7   >