[GitHub] [spark] smallzhongfeng commented on pull request #39558: [SPARK-41982][SQL] Partitions of type string should not be treated as numeric types

2023-01-13 Thread GitBox
smallzhongfeng commented on PR #39558: URL: https://github.com/apache/spark/pull/39558#issuecomment-1382684612 cc @AngersZh @cloud-fan @maropu Hope to get your reply! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-13 Thread GitBox
dongjoon-hyun commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1382683704 I converted the configuration from public to internal and adjust the indentation. ``` SPARK-34454: configs from the legacy namespace should be internal *** FAILED *** (6

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-13 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1070230669 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -366,8 +359,35 @@ class BlockManagerMasterEndpoint( }

[GitHub] [spark] wankunde commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-13 Thread GitBox
wankunde commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1070229836 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -366,8 +359,35 @@ class BlockManagerMasterEndpoint( }

[GitHub] [spark] imhunterand opened a new pull request, #39566: Patched()Fix Protobuf Java vulnerable to Uncontrolled Resource Consumption

2023-01-13 Thread GitBox
imhunterand opened a new pull request, #39566: URL: https://github.com/apache/spark/pull/39566 ## Changes: Affected of this `apache-spark` are vulnerable to Denial of Service (DoS) when providing inputs containing multiple instances of non-repeated embedded messages, with repeated or

[GitHub] [spark] cloud-fan commented on a diff in pull request #39564: [SPARK-41990][SQL] Filrering by composite field name doesn't work

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39564: URL: https://github.com/apache/spark/pull/39564#discussion_r1070226875 ## sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala: ## @@ -91,7 +91,7 @@ case class EqualTo(attribute: String, value: Any) extends Filter {

[GitHub] [spark] cloud-fan commented on a diff in pull request #39564: [SPARK-41990][SQL] Filrering by composite field name doesn't work

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39564: URL: https://github.com/apache/spark/pull/39564#discussion_r1070226778 ## sql/catalyst/src/main/scala/org/apache/spark/sql/sources/filters.scala: ## @@ -91,7 +91,7 @@ case class EqualTo(attribute: String, value: Any) extends Filter {

[GitHub] [spark] HyukjinKwon commented on pull request #39559: [SPARK-42011][CONNECT][PYTHON] Implement DataFrameReader.csv

2023-01-13 Thread GitBox
HyukjinKwon commented on PR #39559: URL: https://github.com/apache/spark/pull/39559#issuecomment-1382676645 That PR is merged. Mind retriggering the test? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon closed pull request #39553: [SPARK-42041][SPARK-42013][CONNECT][PYTHON] DataFrameReader should support list of paths

2023-01-13 Thread GitBox
HyukjinKwon closed pull request #39553: [SPARK-42041][SPARK-42013][CONNECT][PYTHON] DataFrameReader should support list of paths URL: https://github.com/apache/spark/pull/39553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on pull request #39553: [SPARK-42041][SPARK-42013][CONNECT][PYTHON] DataFrameReader should support list of paths

2023-01-13 Thread GitBox
HyukjinKwon commented on PR #39553: URL: https://github.com/apache/spark/pull/39553#issuecomment-1382676595 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #39562: [SPARK-41964][CONNECT][PYTHON][FOLLOW-UP] Fix the jdbc writer not implemented Test

2023-01-13 Thread GitBox
HyukjinKwon closed pull request #39562: [SPARK-41964][CONNECT][PYTHON][FOLLOW-UP] Fix the jdbc writer not implemented Test URL: https://github.com/apache/spark/pull/39562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #39562: [SPARK-41964][CONNECT][PYTHON][FOLLOW-UP] Fix the jdbc writer not implemented Test

2023-01-13 Thread GitBox
HyukjinKwon commented on PR #39562: URL: https://github.com/apache/spark/pull/39562#issuecomment-1382676328 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun closed pull request #39563: [SPARK-42060][K8S][WIP] add new config to override driver/executor k8s containers names

2023-01-13 Thread GitBox
dongjoon-hyun closed pull request #39563: [SPARK-42060][K8S][WIP] add new config to override driver/executor k8s containers names URL: https://github.com/apache/spark/pull/39563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #39565: [SPARK-42062][CONNECT][TESTS] Enforce scalafmt for connect-common

2023-01-13 Thread GitBox
dongjoon-hyun closed pull request #39565: [SPARK-42062][CONNECT][TESTS] Enforce scalafmt for connect-common URL: https://github.com/apache/spark/pull/39565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ulysses-you commented on a diff in pull request #39556: [SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214949 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala: ## @@ -16,48 +16,45 @@ */ package org.apache.spark.sql.execution

[GitHub] [spark] cloud-fan commented on a diff in pull request #39131: [SPARK-41162][SQL] Fix anti- and semi-join for self-join with aggregations

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39131: URL: https://github.com/apache/spark/pull/39131#discussion_r1070214768 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushDownLeftSemiAntiJoin.scala: ## @@ -61,9 +61,10 @@ object PushDownLeftSemiAntiJoin extends

[GitHub] [spark] ulysses-you commented on a diff in pull request #39556: [SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -205,8 +200,10 @@ trait UnaryNode extends LogicalPlan with

[GitHub] [spark] ulysses-you commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214616 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,160 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214607 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -205,8 +200,10 @@ trait UnaryNode extends LogicalPlan with

[GitHub] [spark] ulysses-you commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214567 ## sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala: ## @@ -1314,6 +1314,78 @@ class PlannerSuite extends SharedSparkSession with

[GitHub] [spark] cloud-fan commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214537 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang closed pull request #39557: [SPARK-42045][SQL] ANSI SQL mode: Round/Bround should return an error on tiny/small/big integer overflow

2023-01-13 Thread GitBox
gengliangwang closed pull request #39557: [SPARK-42045][SQL] ANSI SQL mode: Round/Bround should return an error on tiny/small/big integer overflow URL: https://github.com/apache/spark/pull/39557 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ulysses-you commented on pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on PR #39556: URL: https://github.com/apache/spark/pull/39556#issuecomment-1382658442 @cloud-fan addreesed all comments, thank you @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #39557: [SPARK-42045][SQL] ANSI SQL mode: Round/Bround should return an error on tiny/small/big integer overflow

2023-01-13 Thread GitBox
gengliangwang commented on PR #39557: URL: https://github.com/apache/spark/pull/39557#issuecomment-1382658485 Merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
gengliangwang closed pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting. URL: https://github.com/apache/spark/pull/39536 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] ulysses-you commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
ulysses-you commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070214330 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] gengliangwang commented on pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
gengliangwang commented on PR #39536: URL: https://github.com/apache/spark/pull/39536#issuecomment-1382658385 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #39408: [SPARK-41896][SQL] Filtering by row index returns empty results

2023-01-13 Thread GitBox
cloud-fan closed pull request #39408: [SPARK-41896][SQL] Filtering by row index returns empty results URL: https://github.com/apache/spark/pull/39408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on pull request #39408: [SPARK-41896][SQL] Filtering by row index returns empty results

2023-01-13 Thread GitBox
cloud-fan commented on PR #39408: URL: https://github.com/apache/spark/pull/39408#issuecomment-1382658324 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1070213329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,165 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

[GitHub] [spark] cloud-fan commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1070212796 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,165 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

[GitHub] [spark] mridulm commented on pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-13 Thread GitBox
mridulm commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1382655538 Thanks for the review @dongjoon-hyun ! Once the tests pass we can merge it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] mridulm commented on a diff in pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-13 Thread GitBox
mridulm commented on code in PR #37922: URL: https://github.com/apache/spark/pull/37922#discussion_r1070212293 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -366,8 +359,35 @@ class BlockManagerMasterEndpoint( }

[GitHub] [spark] cloud-fan commented on a diff in pull request #37525: [WIP][SPARK-40086][SQL] Improve AliasAwareOutputPartitioning to take all aliases into account

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1070212013 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala: ## @@ -16,24 +16,53 @@ */ package org.apache.spark.sql.execution

[GitHub] [spark] cloud-fan commented on a diff in pull request #37525: [WIP][SPARK-40086][SQL] Improve AliasAwareOutputPartitioning to take all aliases into account

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #37525: URL: https://github.com/apache/spark/pull/37525#discussion_r1070211621 ## sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputExpression.scala: ## @@ -16,24 +16,53 @@ */ package org.apache.spark.sql.execution

[GitHub] [spark] mridulm commented on pull request #37922: [SPARK-40480][SHUFFLE] Remove push-based shuffle data after query finished

2023-01-13 Thread GitBox
mridulm commented on PR #37922: URL: https://github.com/apache/spark/pull/37922#issuecomment-1382653651 The tests are failing @wankunde, though I dont think it is due to your PR. Can you please take a look ? And retrigger it if it is unrelated ? Thanks ! -- This is an automated message

[GitHub] [spark] thejdeep commented on pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-13 Thread GitBox
thejdeep commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1382653643 Thanks for reviewing @dongjoon-hyun. I have updated the branch and addressed the comments. Will wait for the build to run. -- This is an automated message from the Apache Git Service.

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #35969: URL: https://github.com/apache/spark/pull/35969#discussion_r1070211213 ## sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala: ## @@ -154,6 +154,19 @@ class FileBasedDataSourceSuite extends QueryTest }

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35969: [SPARK-38651][SQL] Add `spark.sql.legacy.allowEmptySchemaWrite`

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #35969: URL: https://github.com/apache/spark/pull/35969#discussion_r1070211096 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2896,6 +2896,14 @@ object SQLConf { .stringConf

[GitHub] [spark] dongjoon-hyun commented on pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2023-01-13 Thread GitBox
dongjoon-hyun commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1382652905 Please re-trigger the tests to make it sure. I believe we can have this patch in Apache Saprk 3.4.0. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] thejdeep commented on a diff in pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2023-01-13 Thread GitBox
thejdeep commented on code in PR #35969: URL: https://github.com/apache/spark/pull/35969#discussion_r1070209814 ## sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala: ## @@ -154,6 +156,19 @@ class FileBasedDataSourceSuite extends QueryTest } }

[GitHub] [spark] zhengruifeng opened a new pull request, #39565: [SPARK-42062][CONNECT][TESTS] Enforce scalafmt for connect-common

2023-01-13 Thread GitBox
zhengruifeng opened a new pull request, #39565: URL: https://github.com/apache/spark/pull/39565 ### What changes were proposed in this pull request? Enforce scalafmt for connect-common ### Why are the changes needed? since we started adding scala source files in it

[GitHub] [spark] mridulm commented on pull request #35969: [SPARK-38651][SQL] Add configuration to support writing out empty schemas in supported filebased datasources

2023-01-13 Thread GitBox
mridulm commented on PR #35969: URL: https://github.com/apache/spark/pull/35969#issuecomment-1382649080 Can you look at the test failures @thejdeep ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #39556: [WIP][SPARK-42049][SQL] Improve AliasAwareOutputExpression

2023-01-13 Thread GitBox
cloud-fan commented on code in PR #39556: URL: https://github.com/apache/spark/pull/39556#discussion_r1070207815 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/AliasAwareOutputExpression.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] zhengruifeng commented on pull request #39552: [SPARK-42001][CONNECT][PYTHON][TESTS] Update the related JIRA tickets of two DataFrameReader tests

2023-01-13 Thread GitBox
zhengruifeng commented on PR #39552: URL: https://github.com/apache/spark/pull/39552#issuecomment-1382647448 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #39552: [SPARK-42001][CONNECT][PYTHON][TESTS] Update the related JIRA tickets of two DataFrameReader tests

2023-01-13 Thread GitBox
zhengruifeng closed pull request #39552: [SPARK-42001][CONNECT][PYTHON][TESTS] Update the related JIRA tickets of two DataFrameReader tests URL: https://github.com/apache/spark/pull/39552 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #39551: [SPARK-42047][SPARK-41900][CONNECT][PYTHON] Literal should support Numpy datatypes

2023-01-13 Thread GitBox
zhengruifeng commented on PR #39551: URL: https://github.com/apache/spark/pull/39551#issuecomment-1382647022 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #39551: [SPARK-42047][SPARK-41900][CONNECT][PYTHON] Literal should support Numpy datatypes

2023-01-13 Thread GitBox
zhengruifeng closed pull request #39551: [SPARK-42047][SPARK-41900][CONNECT][PYTHON] Literal should support Numpy datatypes URL: https://github.com/apache/spark/pull/39551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] zhengruifeng commented on pull request #39493: [SPARK-41965][PYTHON][DOCS][WIP] Add DataFrameWriterV2 to PySpark API references

2023-01-13 Thread GitBox
zhengruifeng commented on PR #39493: URL: https://github.com/apache/spark/pull/39493#issuecomment-1382646582 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #39493: [SPARK-41965][PYTHON][DOCS][WIP] Add DataFrameWriterV2 to PySpark API references

2023-01-13 Thread GitBox
zhengruifeng closed pull request #39493: [SPARK-41965][PYTHON][DOCS][WIP] Add DataFrameWriterV2 to PySpark API references URL: https://github.com/apache/spark/pull/39493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-13 Thread GitBox
cloud-fan commented on PR #39479: URL: https://github.com/apache/spark/pull/39479#issuecomment-1382646010 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan closed pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL

2023-01-13 Thread GitBox
cloud-fan closed pull request #39479: [SPARK-41961][SQL] Support table-valued functions with LATERAL URL: https://github.com/apache/spark/pull/39479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2

2023-01-13 Thread GitBox
dongjoon-hyun commented on PR #39561: URL: https://github.com/apache/spark/pull/39561#issuecomment-1382643846 Merged to master for Apache Spark 3.4. Thank you, @williamhyun and @huaxingao . -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] dongjoon-hyun closed pull request #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2

2023-01-13 Thread GitBox
dongjoon-hyun closed pull request #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2 URL: https://github.com/apache/spark/pull/39561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao commented on pull request #39564: [SPARK-41990][SQL] Filrering by composite field name doesn't work

2023-01-13 Thread GitBox
huaxingao commented on PR #39564: URL: https://github.com/apache/spark/pull/39564#issuecomment-1382640160 cc @cloud-fan @sunchao @LuciferYang @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] huaxingao commented on pull request #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2

2023-01-13 Thread GitBox
huaxingao commented on PR #39561: URL: https://github.com/apache/spark/pull/39561#issuecomment-1382635782 +1 Thanks @williamhyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39552: [SPARK-42001][CONNECT][PYTHON][TESTS] Update the related JIRA tickets of two DataFrameReader tests

2023-01-13 Thread GitBox
zhengruifeng commented on code in PR #39552: URL: https://github.com/apache/spark/pull/39552#discussion_r1070192156 ## python/pyspark/sql/tests/test_readwriter.py: ## @@ -53,16 +53,23 @@ def test_save_and_load(self): ) self.assertEqual(sorted(df.collect()),

[GitHub] [spark] huaxingao opened a new pull request, #39564: [SPARK-41990][SQL] Filrering by composite field name doesn't work

2023-01-13 Thread GitBox
huaxingao opened a new pull request, #39564: URL: https://github.com/apache/spark/pull/39564 ### What changes were proposed in this pull request? Use `FieldReference.column` instead of `FieldReference.apply` in V1 to V2 filter conversion ### Why are the changes needed?

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070184068 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +245,71 @@ public void

[GitHub] [spark] hussein-awala opened a new pull request, #39563: [SPARK-42060] add new config to override driver/executor k8s containers names

2023-01-13 Thread GitBox
hussein-awala opened a new pull request, #39563: URL: https://github.com/apache/spark/pull/39563 ### What changes were proposed in this pull request? Adding two new config `spark.kubernetes.driver.container.name` and `spark.kubernetes.executor.container.name` to override the

[GitHub] [spark] techaddict commented on pull request #39559: [SPARK-42011][CONNECT][PYTHON] Implement DataFrameReader.csv

2023-01-13 Thread GitBox
techaddict commented on PR #39559: URL: https://github.com/apache/spark/pull/39559#issuecomment-1382603008 Updated the PR, tests will start passing once #39553 is merged, as its using `path: PathOrPaths` -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] github-actions[bot] commented on pull request #38039: [SPARK-40603][SQL] Throw the original error from catalog implementations

2023-01-13 Thread GitBox
github-actions[bot] commented on PR #38039: URL: https://github.com/apache/spark/pull/38039#issuecomment-1382601655 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #38024: [SPARK-40591][CORE][SQL] Fix data loss caused by ignoreCorruptFiles

2023-01-13 Thread GitBox
github-actions[bot] commented on PR #38024: URL: https://github.com/apache/spark/pull/38024#issuecomment-1382601668 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] techaddict commented on a diff in pull request #39559: [SPARK-42011][CONNECT][PYTHON] Implement DataFrameReader.csv

2023-01-13 Thread GitBox
techaddict commented on code in PR #39559: URL: https://github.com/apache/spark/pull/39559#discussion_r1070173868 ## python/pyspark/sql/connect/readwriter.py: ## @@ -514,6 +586,7 @@ def _test() -> None: del pyspark.sql.connect.readwriter.DataFrameReader.load.__doc__

[GitHub] [spark] techaddict opened a new pull request, #39562: [SPARK-41964][CONNECT][PYTHON][FOLLOW-UP] Fix the jdbc writer not implemented Test

2023-01-13 Thread GitBox
techaddict opened a new pull request, #39562: URL: https://github.com/apache/spark/pull/39562 ### What changes were proposed in this pull request? Fix the `jdbc` Writer function not implemented test ### Why are the changes needed? Fixing a test ### Does this PR

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070170721 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +245,71 @@ public void

[GitHub] [spark] williamhyun commented on pull request #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2

2023-01-13 Thread GitBox
williamhyun commented on PR #39561: URL: https://github.com/apache/spark/pull/39561#issuecomment-1382596613 cc: @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070170721 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +245,71 @@ public void

[GitHub] [spark] williamhyun opened a new pull request, #39561: [SPARK-42059][BUILD] Update ORC to 1.8.2

2023-01-13 Thread GitBox
williamhyun opened a new pull request, #39561: URL: https://github.com/apache/spark/pull/39561 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] otterc commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
otterc commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070168146 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +245,71 @@ public void

[GitHub] [spark] HyukjinKwon closed pull request #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py

2023-01-13 Thread GitBox
HyukjinKwon closed pull request #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py URL: https://github.com/apache/spark/pull/39560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py

2023-01-13 Thread GitBox
HyukjinKwon commented on PR #39560: URL: https://github.com/apache/spark/pull/39560#issuecomment-1382577864 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39365: [SPARK-41859][SQL] CreateHiveTableAsSelectCommand should set the overwrite flag correctly

2023-01-13 Thread GitBox
HyukjinKwon commented on code in PR #39365: URL: https://github.com/apache/spark/pull/39365#discussion_r1070153734 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala: ## @@ -95,7 +94,6 @@ case class

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39559: [SPARK-42011][CONNECT][PYTHON] Implement DataFrameReader.csv

2023-01-13 Thread GitBox
HyukjinKwon commented on code in PR #39559: URL: https://github.com/apache/spark/pull/39559#discussion_r1070151153 ## python/pyspark/sql/connect/readwriter.py: ## @@ -514,6 +586,7 @@ def _test() -> None: del pyspark.sql.connect.readwriter.DataFrameReader.load.__doc__

[GitHub] [spark] techaddict commented on a diff in pull request #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py

2023-01-13 Thread GitBox
techaddict commented on code in PR #39560: URL: https://github.com/apache/spark/pull/39560#discussion_r1069958563 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2503,7 +2503,7 @@ def test_unsupported_io_functions(self): for f in ("jdbc",):

[GitHub] [spark] srielau commented on a diff in pull request #39537: [SPARK-41994] [DRAFT] Assign SQLSTATE's (1/?)

2023-01-13 Thread GitBox
srielau commented on code in PR #39537: URL: https://github.com/apache/spark/pull/39537#discussion_r1070133871 ## core/src/main/resources/error/error-classes.json: ## @@ -758,22 +782,26 @@ "INVALID_IDENTIFIER" : { "message" : [ "The identifier is invalid.

[GitHub] [spark] akpatnam25 commented on pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1382447740 addressed @rmcyang's comments, will monitor the build. I have been getting some linter errors from the python side which are unrelated to this PR. -- This is an automated message

[GitHub] [spark] mridulm commented on pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
mridulm commented on PR #38959: URL: https://github.com/apache/spark/pull/38959#issuecomment-1382436350 I will merge it once @rmcyang's comments are addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] rmcyang commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
rmcyang commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070099546 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java: ## @@ -192,8 +200,14 @@ private synchronized void

[GitHub] [spark] gengliangwang commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
gengliangwang commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1070083087 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3239,7 +3239,7 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070072578 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +243,50 @@ public void

[GitHub] [spark] akpatnam25 commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
akpatnam25 commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1070072578 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +243,50 @@ public void

[GitHub] [spark] rangadi commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
rangadi commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1070054068 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3239,7 +3239,7 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] rangadi commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
rangadi commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1070054068 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3239,7 +3239,7 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] mridulm commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
mridulm commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1069990003 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +243,50 @@ public void

[GitHub] [spark] mridulm commented on a diff in pull request #38959: SPARK-41415: SASL Request Retries

2023-01-13 Thread GitBox
mridulm commented on code in PR #38959: URL: https://github.com/apache/spark/pull/38959#discussion_r1069990003 ## common/network-shuffle/src/test/java/org/apache/spark/network/shuffle/RetryingBlockTransferorSuite.java: ## @@ -230,6 +243,50 @@ public void

[GitHub] [spark] gengliangwang commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
gengliangwang commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1069979484 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -3239,7 +3239,7 @@ private[sql] object QueryCompilationErrors

[GitHub] [spark] rangadi commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
rangadi commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1069966877 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -1248,6 +1248,23 @@ class ProtobufFunctionsSuite extends

[GitHub] [spark] techaddict commented on a diff in pull request #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py

2023-01-13 Thread GitBox
techaddict commented on code in PR #39560: URL: https://github.com/apache/spark/pull/39560#discussion_r1069958563 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2503,7 +2503,7 @@ def test_unsupported_io_functions(self): for f in ("jdbc",):

[GitHub] [spark] techaddict opened a new pull request, #39560: [MINOR][CONNECT][TESTS] Fix typos in tests/connect/test_connect_basic.py

2023-01-13 Thread GitBox
techaddict opened a new pull request, #39560: URL: https://github.com/apache/spark/pull/39560 ### What changes were proposed in this pull request? Fixing typos in ests/connect/test_connect_basic.py ### Why are the changes needed? Typos ### Does this PR introduce _any_

[GitHub] [spark] techaddict opened a new pull request, #39559: [SPARK-42011][CONNECT][PYTHON] Implement DataFrameReader.csv

2023-01-13 Thread GitBox
techaddict opened a new pull request, #39559: URL: https://github.com/apache/spark/pull/39559 ### What changes were proposed in this pull request? This PR implements `DataFrameReader.csv` alias in Spark Connect. ### Why are the changes needed? For API feature parity. ###

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069934932 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -323,7 +384,7 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069933779 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -259,20 +294,46 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069925882 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -239,6 +239,41 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069923212 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -239,6 +239,41 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069923212 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -239,6 +239,41 @@

[GitHub] [spark] rangadi commented on a diff in pull request #39039: [SPARK-40776][SQL][PROTOBUF][DOCS] Spark-Protobuf docs

2023-01-13 Thread GitBox
rangadi commented on code in PR #39039: URL: https://github.com/apache/spark/pull/39039#discussion_r1069921047 ## docs/sql-data-sources-protobuf.md: ## @@ -0,0 +1,384 @@ +--- +layout: global +title: Protobuf Data Source Guide +license: | + Licensed to the Apache Software

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817][K8S] `spark.files` should preserve remote files

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069920657 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -239,6 +239,41 @@

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38376: [SPARK-40817] [Kubernetes] Do not discard remote user-specified files when launching Spark jobs on Kubernetes

2023-01-13 Thread GitBox
dongjoon-hyun commented on code in PR #38376: URL: https://github.com/apache/spark/pull/38376#discussion_r1069915675 ## resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala: ## @@ -239,6 +239,41 @@

[GitHub] [spark] gengliangwang commented on a diff in pull request #39536: [SPARK-42057][SQL][PROTOBUF] Fix how exception is handled in error reporting.

2023-01-13 Thread GitBox
gengliangwang commented on code in PR #39536: URL: https://github.com/apache/spark/pull/39536#discussion_r1069914183 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -1248,6 +1248,23 @@ class ProtobufFunctionsSuite extends

  1   2   3   >