[GitHub] [spark] MaxGekk closed pull request #38000: [SPARK-40540][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1199

2022-09-27 Thread GitBox
MaxGekk closed pull request #38000: [SPARK-40540][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1199 URL: https://github.com/apache/spark/pull/38000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] MaxGekk commented on pull request #38000: [SPARK-40540][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1199

2022-09-27 Thread GitBox
MaxGekk commented on PR #38000: URL: https://github.com/apache/spark/pull/38000#issuecomment-1260417870 Only one test failed: ``` DAGSchedulerSuite.SPARK-40096: Send finalize events even if shuffle merger blocks indefinitely with registerMergeResults is false

[GitHub] [spark] wbo4958 commented on pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-27 Thread GitBox
wbo4958 commented on PR #37855: URL: https://github.com/apache/spark/pull/37855#issuecomment-1260417639 > Good catch! seems we can also simply switch to `XORShiftRandom` which always [hash the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] Add example for applyInPandasWithState

2022-09-27 Thread GitBox
HyukjinKwon commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981964990 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,130 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] Add example for applyInPandasWithState

2022-09-27 Thread GitBox
chaoqin-li1123 commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981960624 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,130 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] Add example for applyInPandasWithState

2022-09-27 Thread GitBox
chaoqin-li1123 commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981960722 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,130 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] amaliujia commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r981950320 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -185,11 +186,10 @@ object SparkConnectService { /** * Starts

[GitHub] [spark] cloud-fan commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-27 Thread GitBox
cloud-fan commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r981940234 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -185,11 +186,10 @@ object SparkConnectService { /** * Starts

[GitHub] [spark] cloud-fan commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-27 Thread GitBox
cloud-fan commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r981940234 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -185,11 +186,10 @@ object SparkConnectService { /** * Starts

[GitHub] [spark] zhengruifeng opened a new pull request, #38026: [SPARK-40592][PS] Implement `min_count` in `GroupBy.max`

2022-09-27 Thread GitBox
zhengruifeng opened a new pull request, #38026: URL: https://github.com/apache/spark/pull/38026 ### What changes were proposed in this pull request? Implement `min_count` in `GroupBy.max` ### Why are the changes needed? for API coverage ### Does this PR introduce

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981927808 ## connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981927808 ## connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] amaliujia commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981925368 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] amaliujia commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981925368 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] amaliujia commented on pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
amaliujia commented on PR #38004: URL: https://github.com/apache/spark/pull/38004#issuecomment-1260363294 Thanks for the link of the implementation! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981920306 ## connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981922371 ## repl/pom.xml: ## @@ -58,6 +58,11 @@ spark-sql_${scala.binary.version} ${project.version} + + org.apache.spark +

[GitHub] [spark] amaliujia commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
amaliujia commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981920306 ## connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] amaliujia commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r981918310 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -189,7 +189,7 @@ object SparkConnectService { */ def

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37995: [SPARK-40556][PS][SQL] Unpersist the intermediate datasets cached in `AttachDistributedSequenceExec`

2022-09-27 Thread GitBox
zhengruifeng commented on code in PR #37995: URL: https://github.com/apache/spark/pull/37995#discussion_r981915981 ## python/pyspark/pandas/series.py: ## @@ -6442,6 +6445,8 @@ def argmin(self, axis: Axis = None, skipna: bool = True) -> int: raise ValueError("axis

[GitHub] [spark] LuciferYang commented on a diff in pull request #38025: [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE for ParquetWrite

2022-09-27 Thread GitBox
LuciferYang commented on code in PR #38025: URL: https://github.com/apache/spark/pull/38025#discussion_r981915685 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWrite.scala: ## @@ -94,7 +94,7 @@ case class ParquetWrite(

[GitHub] [spark] LuciferYang opened a new pull request, #38025: [MINOR][SQL] Skip warning if JOB_SUMMARY_LEVEL is set to NONE for ParquetWrite

2022-09-27 Thread GitBox
LuciferYang opened a new pull request, #38025: URL: https://github.com/apache/spark/pull/38025 ### What changes were proposed in this pull request? This pr do the similar change as https://github.com/apache/spark/pull/24808 ### Why are the changes needed? The print

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37995: [SPARK-40556][PS][SQL] Unpersist the intermediate datasets cached in `AttachDistributedSequenceExec`

2022-09-27 Thread GitBox
zhengruifeng commented on code in PR #37995: URL: https://github.com/apache/spark/pull/37995#discussion_r981914861 ## python/pyspark/pandas/series.py: ## @@ -6442,6 +6445,8 @@ def argmin(self, axis: Axis = None, skipna: bool = True) -> int: raise ValueError("axis

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981913805 ## connect/src/main/scala/org/apache/spark/sql/catalyst/connect/connect.scala: ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #37994: [SPARK-40454][CONNECT] Initial DSL framework for protobuf testing

2022-09-27 Thread GitBox
cloud-fan commented on code in PR #37994: URL: https://github.com/apache/spark/pull/37994#discussion_r981913250 ## repl/pom.xml: ## @@ -58,6 +58,11 @@ spark-sql_${scala.binary.version} ${project.version} + + org.apache.spark +

[GitHub] [spark] yaooqinn commented on a diff in pull request #38024: [SPARK-40591][SQL] Fix data loss caused by ignoreCorruptFiles

2022-09-27 Thread GitBox
yaooqinn commented on code in PR #38024: URL: https://github.com/apache/spark/pull/38024#discussion_r981911877 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: ## @@ -253,7 +259,7 @@ class FileScanRDD( null

[GitHub] [spark] cloud-fan commented on a diff in pull request #38006: [SPARK-40536][CONNECT] Make Spark Connect port configurable

2022-09-27 Thread GitBox
cloud-fan commented on code in PR #38006: URL: https://github.com/apache/spark/pull/38006#discussion_r981911829 ## connect/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectService.scala: ## @@ -189,7 +189,7 @@ object SparkConnectService { */ def

[GitHub] [spark] LuciferYang opened a new pull request, #37654: [SPARK-40216][SQL] Extract common `ParquetUtils.prepareWrite` method to deduplicate code in `ParquetFileFormat` and `ParquetWrite`

2022-09-27 Thread GitBox
LuciferYang opened a new pull request, #37654: URL: https://github.com/apache/spark/pull/37654 ### What changes were proposed in this pull request? This pr is a refactor work, the main change is extract `ParquetUtils.prepareWrite` method to eliminate duplicate code in

[GitHub] [spark] cloud-fan commented on pull request #37654: [SPARK-40216][SQL] Extract common `ParquetUtils.prepareWrite` method to deduplicate code in `ParquetFileFormat` and `ParquetWrite`

2022-09-27 Thread GitBox
cloud-fan commented on PR #37654: URL: https://github.com/apache/spark/pull/37654#issuecomment-1260343604 @sadikovi can you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang closed pull request #37654: [SPARK-40216][SQL] Extract common `ParquetUtils.prepareWrite` method to deduplicate code in `ParquetFileFormat` and `ParquetWrite`

2022-09-27 Thread GitBox
LuciferYang closed pull request #37654: [SPARK-40216][SQL] Extract common `ParquetUtils.prepareWrite` method to deduplicate code in `ParquetFileFormat` and `ParquetWrite` URL: https://github.com/apache/spark/pull/37654 -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] zhengruifeng commented on pull request #38014: [SPARK-40575][DOCS] Add badges for PySpark downloads

2022-09-27 Thread GitBox
zhengruifeng commented on PR #38014: URL: https://github.com/apache/spark/pull/38014#issuecomment-1260336014 If we want to keep only one badge, I think we can use badge `Pypi downloads` linking to `pypi`, like [numpy](https://github.com/numpy/numpy) /

[GitHub] [spark] yaooqinn commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-09-27 Thread GitBox
yaooqinn commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1260335417 +1, and sorry for not merging it after my approval -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LuciferYang commented on a diff in pull request #38024: [SPARK-40591][SQL] Fix data loss caused by ignoreCorruptFiles

2022-09-27 Thread GitBox
LuciferYang commented on code in PR #38024: URL: https://github.com/apache/spark/pull/38024#discussion_r981889549 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: ## @@ -253,7 +259,7 @@ class FileScanRDD( null

[GitHub] [spark] AngersZhuuuu opened a new pull request, #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-09-27 Thread GitBox
AngersZh opened a new pull request, #35594: URL: https://github.com/apache/spark/pull/35594 ### What changes were proposed in this pull request? Current Spark SQL CLI alway use shutdown hook to stop SparkSQLEnv ``` // Clean up after we exit

[GitHub] [spark] cloud-fan commented on pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-09-27 Thread GitBox
cloud-fan commented on PR #35594: URL: https://github.com/apache/spark/pull/35594#issuecomment-1260321186 @AngersZh can you rebase? We should merge this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] zhengruifeng commented on pull request #37855: [SPARK-40407][SQL] Fix the potential data skew caused by df.repartition

2022-09-27 Thread GitBox
zhengruifeng commented on PR #37855: URL: https://github.com/apache/spark/pull/37855#issuecomment-1260319666 Good catch! seems we can also simply switch to `XORShiftRandom` which always [hash the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38024: [SPARK-40591][SQL] Fix data loss caused by ignoreCorruptFiles

2022-09-27 Thread GitBox
LuciferYang commented on code in PR #38024: URL: https://github.com/apache/spark/pull/38024#discussion_r981889549 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: ## @@ -253,7 +259,7 @@ class FileScanRDD( null

[GitHub] [spark] caican00 commented on pull request #37876: [SPARK-40175][CORE][SQL][MLLIB][DSTREAM][R] Optimize the performance of `keys.zip(values).toMap` code pattern

2022-09-27 Thread GitBox
caican00 commented on PR #37876: URL: https://github.com/apache/spark/pull/37876#issuecomment-1260294346 > I haven't thought of a better way yet Thanks. I'll share it with you if I can think of a better way -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] yaooqinn commented on pull request #38024: [SPARK-40591][SQL] Fix data loss caused by ignoreCorruptFiles

2022-09-27 Thread GitBox
yaooqinn commented on PR #38024: URL: https://github.com/apache/spark/pull/38024#issuecomment-1260291484 cc @cloud-fan @dongjoon-hyun @HyukjinKwon @wangyum thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] yaooqinn opened a new pull request, #38024: [SPARK-40591][SQL] Fix data loss caused by ignoreCorruptFiles

2022-09-27 Thread GitBox
yaooqinn opened a new pull request, #38024: URL: https://github.com/apache/spark/pull/38024 ### What changes were proposed in this pull request? Let's take a look at the case below, the left and the right are visiting the same table and its partitions, and both of them

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] add example for applyInPandasWithState

2022-09-27 Thread GitBox
HyukjinKwon commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981863988 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,130 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38018: [SPARK-40580][PS][DOCS] Update the document for `DataFrame.to_orc`.

2022-09-27 Thread GitBox
HyukjinKwon commented on code in PR #38018: URL: https://github.com/apache/spark/pull/38018#discussion_r981856947 ## python/pyspark/pandas/frame.py: ## @@ -5317,6 +5317,12 @@ def to_orc( ... '%s/to_orc/foo.orc' % path, ... mode = 'overwrite',

[GitHub] [spark] zhengruifeng commented on pull request #38017: [SPARK-40579][PS] `GroupBy.first` should skip NULLs

2022-09-27 Thread GitBox
zhengruifeng commented on PR #38017: URL: https://github.com/apache/spark/pull/38017#issuecomment-1260274298 Thanks you @HyukjinKwon @dongjoon-hyun @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon closed pull request #38017: [SPARK-40579][PS] `GroupBy.first` should skip NULLs

2022-09-27 Thread GitBox
HyukjinKwon closed pull request #38017: [SPARK-40579][PS] `GroupBy.first` should skip NULLs URL: https://github.com/apache/spark/pull/38017 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Yikun closed pull request #35088: [SPARK-37758][PYTHON][BUILD] Enable PySpark test scheduled job on ARM runner

2022-09-27 Thread GitBox
Yikun closed pull request #35088: [SPARK-37758][PYTHON][BUILD] Enable PySpark test scheduled job on ARM runner URL: https://github.com/apache/spark/pull/35088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon closed pull request #38016: [SPARK-40578][PS] Fix `IndexesTest.test_to_frame` when pandas 1.5.0

2022-09-27 Thread GitBox
HyukjinKwon closed pull request #38016: [SPARK-40578][PS] Fix `IndexesTest.test_to_frame` when pandas 1.5.0 URL: https://github.com/apache/spark/pull/38016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #38017: [SPARK-40579][PS] `GroupBy.first` should skip NULLs

2022-09-27 Thread GitBox
HyukjinKwon commented on PR #38017: URL: https://github.com/apache/spark/pull/38017#issuecomment-1260272833 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #38016: [SPARK-40578][PS] Fix `IndexesTest.test_to_frame` when pandas 1.5.0

2022-09-27 Thread GitBox
HyukjinKwon commented on PR #38016: URL: https://github.com/apache/spark/pull/38016#issuecomment-1260272532 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-09-27 Thread GitBox
mridulm commented on PR #37638: URL: https://github.com/apache/spark/pull/37638#issuecomment-1260268934 +CC @otterc, @Ngone51 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] mridulm commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-09-27 Thread GitBox
mridulm commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r981843162 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -593,6 +607,9 @@ public void onData(String streamId,

[GitHub] [spark] github-actions[bot] commented on pull request #35319: [SPARK-36571][SQL] Add new SQLPathHadoopMapReduceCommitProtocol resolve conflict when write into partition table's different part

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35319: URL: https://github.com/apache/spark/pull/35319#issuecomment-1260238704 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #34856: [SPARK-37602][CORE] Add config property to set default Spark listeners

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #34856: URL: https://github.com/apache/spark/pull/34856#issuecomment-1260238760 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35088: [SPARK-37758][PYTHON][BUILD] Enable PySpark test scheduled job on ARM runner

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35088: URL: https://github.com/apache/spark/pull/35088#issuecomment-1260238718 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #34903: [SPARK-37650][PYTHON] Tell spark-env.sh the python interpreter

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #34903: URL: https://github.com/apache/spark/pull/34903#issuecomment-1260238745 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35337: [SPARK-37840][SQL] Dynamic Update of UDF

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35337: URL: https://github.com/apache/spark/pull/35337#issuecomment-1260238691 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35569: [SPARK-38250][CORE] Check existence before deleting stagingDir in HadoopMapReduceCommitProtocol

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35569: [SPARK-38250][CORE] Check existence before deleting stagingDir in HadoopMapReduceCommitProtocol URL: https://github.com/apache/spark/pull/35569 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] github-actions[bot] commented on pull request #35371: [WIP][SPARK-37946][SQL] Use error classes in the execution errors related to partitions

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35371: URL: https://github.com/apache/spark/pull/35371#issuecomment-1260238670 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35548: [SPARK-38234] [SQL] [SS] Added structured streaming monitoring APIs.

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35548: URL: https://github.com/apache/spark/pull/35548#issuecomment-1260238660 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35549: [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35549: URL: https://github.com/apache/spark/pull/35549#issuecomment-1260238647 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35734: [SPARK-32432][SQL] Add support for reading ORC/Parquet files of SymlinkTextInputFormat table And Fix Analyze for SymlinkTextInputFormat

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35734: [SPARK-32432][SQL] Add support for reading ORC/Parquet files of SymlinkTextInputFormat table And Fix Analyze for SymlinkTextInputFormat table URL: https://github.com/apache/spark/pull/35734 -- This is an automated message from the Apache Git

[GitHub] [spark] github-actions[bot] closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35594: [SPARK-38270][SQL] Spark SQL CLI's AM should keep same exit code with client side URL: https://github.com/apache/spark/pull/35594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #35638: [SPARK-38296][SQL] Support error class AnalysisExceptions in FunctionRegistry

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35638: [SPARK-38296][SQL] Support error class AnalysisExceptions in FunctionRegistry URL: https://github.com/apache/spark/pull/35638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] github-actions[bot] commented on pull request #35550: [SPARK-38238][SQL]Contains Join for Spark SQL

2022-09-27 Thread GitBox
github-actions[bot] commented on PR #35550: URL: https://github.com/apache/spark/pull/35550#issuecomment-1260238637 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35608: [SPARK-32838][SQL] Static partition overwrite could use staging dir insert

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35608: [SPARK-32838][SQL] Static partition overwrite could use staging dir insert URL: https://github.com/apache/spark/pull/35608 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #35748: [SPARK-38431][SQL]Support to delete matched rows from jdbc tables

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35748: [SPARK-38431][SQL]Support to delete matched rows from jdbc tables URL: https://github.com/apache/spark/pull/35748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] closed pull request #35744: [SPARK-37383][SQL][WEBUI]Show the parsing time for each phase of a SQL on spark ui

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35744: [SPARK-37383][SQL][WEBUI]Show the parsing time for each phase of a SQL on spark ui URL: https://github.com/apache/spark/pull/35744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #36889: [SPARK-21195][CORE] Dynamically register metrics from sources as they are reported

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #36889: [SPARK-21195][CORE] Dynamically register metrics from sources as they are reported URL: https://github.com/apache/spark/pull/36889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] github-actions[bot] closed pull request #35867: [SPARK-38559][SQL][WEBUI]Display the number of empty partitions on spark ui

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35867: [SPARK-38559][SQL][WEBUI]Display the number of empty partitions on spark ui URL: https://github.com/apache/spark/pull/35867 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] github-actions[bot] closed pull request #35990: [SPARK-38639][SQL] Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly

2022-09-27 Thread GitBox
github-actions[bot] closed pull request #35990: [SPARK-38639][SQL] Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly URL: https://github.com/apache/spark/pull/35990 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] add example for applyInPandasWithState

2022-09-27 Thread GitBox
chaoqin-li1123 commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981833418 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,128 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981811676 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981811676 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] aokolnychyi commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
aokolnychyi commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981809103 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] aokolnychyi commented on pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
aokolnychyi commented on PR #38004: URL: https://github.com/apache/spark/pull/38004#issuecomment-1260176172 @amaliujia, I have linked https://github.com/apache/spark/pull/38005 that adds test coverage and implementation. I've split this work to reduce the scope of each PR and simplify

[GitHub] [spark] dongjoon-hyun closed pull request #38011: [SPARK-40574][DOCS] Enhance DROP TABLE documentation

2022-09-27 Thread GitBox
dongjoon-hyun closed pull request #38011: [SPARK-40574][DOCS] Enhance DROP TABLE documentation URL: https://github.com/apache/spark/pull/38011 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #38021: [SPARK-40583][DOCS] Fixing artifactId name in `cloud-integration.md`

2022-09-27 Thread GitBox
dongjoon-hyun commented on PR #38021: URL: https://github.com/apache/spark/pull/38021#issuecomment-1260138452 Welcome to the Apache Spark community, @danitico . I added you to the Apache Spark contributor group and assign SPARK-40583 to you. -- This is an automated message from the

[GitHub] [spark] dongjoon-hyun closed pull request #38021: [SPARK-40583][DOCS] Fixing artifactId name in `cloud-integration.md`

2022-09-27 Thread GitBox
dongjoon-hyun closed pull request #38021: [SPARK-40583][DOCS] Fixing artifactId name in `cloud-integration.md` URL: https://github.com/apache/spark/pull/38021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] amaliujia commented on a diff in pull request #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38023: URL: https://github.com/apache/spark/pull/38023#discussion_r981760665 ## connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -155,4 +156,7 @@ message Expression { string expression = 1; } + // represent * (e.g.

[GitHub] [spark] amaliujia commented on pull request #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto

2022-09-27 Thread GitBox
amaliujia commented on PR #38023: URL: https://github.com/apache/spark/pull/38023#issuecomment-1260122735 @HyukjinKwon @cloud-fan @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] amaliujia commented on a diff in pull request #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38023: URL: https://github.com/apache/spark/pull/38023#discussion_r981761756 ## connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -155,4 +156,7 @@ message Expression { string expression = 1; } + // represent * (e.g.

[GitHub] [spark] amaliujia commented on a diff in pull request #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38023: URL: https://github.com/apache/spark/pull/38023#discussion_r981761756 ## connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -155,4 +156,7 @@ message Expression { string expression = 1; } + // represent * (e.g.

[GitHub] [spark] amaliujia commented on a diff in pull request #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38023: URL: https://github.com/apache/spark/pull/38023#discussion_r981760665 ## connect/src/main/protobuf/spark/connect/expressions.proto: ## @@ -155,4 +156,7 @@ message Expression { string expression = 1; } + // represent * (e.g.

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38015: [SPARK-40577][PS] Fix `CategoricalIndex.append` to match pandas 1.5.0

2022-09-27 Thread GitBox
xinrong-meng commented on code in PR #38015: URL: https://github.com/apache/spark/pull/38015#discussion_r981758988 ## python/pyspark/pandas/indexes/base.py: ## @@ -1907,6 +1908,9 @@ def append(self, other: "Index") -> "Index": ) index_fields =

[GitHub] [spark] amaliujia opened a new pull request, #38023: [SPARK-40587][CONNECT] Support SELECT * in an explicit way by connect proto

2022-09-27 Thread GitBox
amaliujia opened a new pull request, #38023: URL: https://github.com/apache/spark/pull/38023 ### What changes were proposed in this pull request? Support `SELECT *` in an explicit way by connect proto. ### Why are the changes needed? Current proto uses empty

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38015: [SPARK-40577][PS] Fix `CategoricalIndex.append` to match pandas 1.5.0

2022-09-27 Thread GitBox
xinrong-meng commented on code in PR #38015: URL: https://github.com/apache/spark/pull/38015#discussion_r981758420 ## python/pyspark/pandas/indexes/base.py: ## @@ -1907,6 +1908,9 @@ def append(self, other: "Index") -> "Index": ) index_fields =

[GitHub] [spark] xinrong-meng commented on a diff in pull request #38018: [SPARK-40580][PS][DOCS] Update the document for `DataFrame.to_orc`.

2022-09-27 Thread GitBox
xinrong-meng commented on code in PR #38018: URL: https://github.com/apache/spark/pull/38018#discussion_r981752197 ## python/pyspark/pandas/frame.py: ## @@ -5317,6 +5317,12 @@ def to_orc( ... '%s/to_orc/foo.orc' % path, ... mode = 'overwrite',

[GitHub] [spark] xinrong-meng commented on pull request #38018: [SPARK-40580][PS][DOCS] Update the document for `DataFrame.to_orc`.

2022-09-27 Thread GitBox
xinrong-meng commented on PR #38018: URL: https://github.com/apache/spark/pull/38018#issuecomment-1260097710 pandas-on-Spark is more likely to be a developers' reference in the source code, whereas `pandas API on Spark` is the official, user-facing name. Hope that helps :) @bjornjorgensen

[GitHub] [spark] amaliujia commented on a diff in pull request #38004: [SPARK-40551][SQL] DataSource V2: Add APIs for delta-based row-level operations

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38004: URL: https://github.com/apache/spark/pull/38004#discussion_r981745489 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/LogicalWriteInfo.java: ## @@ -45,4 +45,18 @@ public interface LogicalWriteInfo { * the schema

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] add example for applyInPandasWithState

2022-09-27 Thread GitBox
HeartSaVioR commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981724244 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,128 @@ +# +# Licensed to the Apache Software Foundation

[GitHub] [spark] HeartSaVioR commented on pull request #38013: [SPARK-40509][SS][PYTHON] add example for applyInPandasWithState

2022-09-27 Thread GitBox
HeartSaVioR commented on PR #38013: URL: https://github.com/apache/spark/pull/38013#issuecomment-1260062541 @chaoqin-li1123 https://github.com/chaoqin-li1123/spark/actions/runs/3138156803/jobs/5097193712 Linter is still complaining. Could you take a look? You can install

[GitHub] [spark] attilapiros commented on a diff in pull request #37990: [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

2022-09-27 Thread GitBox
attilapiros commented on code in PR #37990: URL: https://github.com/apache/spark/pull/37990#discussion_r981696574 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala: ## @@ -193,22 +197,19 @@

[GitHub] [spark] Kimahriman commented on pull request #37770: [SPARK-40314][SQL][PYTHON] Add scala and python bindings for inline and inline_outer

2022-09-27 Thread GitBox
Kimahriman commented on PR #37770: URL: https://github.com/apache/spark/pull/37770#issuecomment-1260009009 > also, what about adding some tests in `python/pyspark/sql/tests/test_functions.py`? Thought I found all the places there were explode tests to add inline as well but missed

[GitHub] [spark] amaliujia commented on pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-27 Thread GitBox
amaliujia commented on PR #38003: URL: https://github.com/apache/spark/pull/38003#issuecomment-1259981240 > The more realistic use case was using a non-deterministic udf for accumulator things, with the push down resulting in different values, the rand was just the easiest way to test it.

[GitHub] [spark] MaxGekk commented on pull request #38000: [SPARK-40540][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1199

2022-09-27 Thread GitBox
MaxGekk commented on PR #38000: URL: https://github.com/apache/spark/pull/38000#issuecomment-1259967353 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] MaxGekk commented on pull request #38000: [SPARK-40540][SQL] Migrate compilation errors onto error classes: _LEGACY_ERROR_TEMP_1100-1199

2022-09-27 Thread GitBox
MaxGekk commented on PR #38000: URL: https://github.com/apache/spark/pull/38000#issuecomment-1259967214 @srielau @cloud-fan @anchovYu @gatorsmile Please, review this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] amaliujia commented on a diff in pull request #38007: [SPARK-40566][SQL] Add showIndex function

2022-09-27 Thread GitBox
amaliujia commented on code in PR #38007: URL: https://github.com/apache/spark/pull/38007#discussion_r981636250 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -2635,6 +2635,10 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with

[GitHub] [spark] Kimahriman commented on pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-27 Thread GitBox
Kimahriman commented on PR #38003: URL: https://github.com/apache/spark/pull/38003#issuecomment-1259938755 The more realistic use case was using a non-deterministic udf for accumulator things, with the push down resulting in different values, the rand was just the easiest way to test it.

[GitHub] [spark] amaliujia commented on pull request #38003: [SPARK-40565][SQL] Don't push non-deterministic filters to V2 file sources

2022-09-27 Thread GitBox
amaliujia commented on PR #38003: URL: https://github.com/apache/spark/pull/38003#issuecomment-1259929204 Just curious if this is ever discussed: The `rand()`, for example, can be evaluated before pushing down. This is more like a query re-writing that such non-deterministic

[GitHub] [spark] amaliujia commented on pull request #37993: [SPARK-40557][CONNECT] Update generated proto files for Spark Connect

2022-09-27 Thread GitBox
amaliujia commented on PR #37993: URL: https://github.com/apache/spark/pull/37993#issuecomment-1259922406 post + 1. Thanks for following up on this quickly! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] chaoqin-li1123 commented on a diff in pull request #38013: [SPARK-40509][SS][PYTHON] add example for applyInPandasWithState

2022-09-27 Thread GitBox
chaoqin-li1123 commented on code in PR #38013: URL: https://github.com/apache/spark/pull/38013#discussion_r981605902 ## examples/src/main/python/sql/streaming/structured_network_wordcount_session_window.py: ## @@ -0,0 +1,114 @@ +# +# Licensed to the Apache Software Foundation

  1   2   3   >