[GitHub] [spark] itholic opened a new pull request, #40910: [SPARK-43234][CONNECT][PYTHON] Migrate `ValueError` from Conect DataFrame into error class

2023-04-21 Thread via GitHub
itholic opened a new pull request, #40910: URL: https://github.com/apache/spark/pull/40910 ### What changes were proposed in this pull request? This PR proposes to migrate ValueError into PySparkValueError from Spark Connect DataFrame. ### Why are the changes needed?

[GitHub] [spark] puneetguptanitj opened a new pull request, #40909: [SPARK-42411] [Kubernetes] Add support for istio with strict mtls

2023-04-21 Thread via GitHub
puneetguptanitj opened a new pull request, #40909: URL: https://github.com/apache/spark/pull/40909 ### What changes were proposed in this pull request? Following describes the changes made, all changes are behind respective configuration properties 1. Followed the same model

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40908: [SPARK-42750] Support Insert By Name statement

2023-04-21 Thread via GitHub
Hisoka-X commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1174282609 ## sql/core/src/test/scala/org/apache/spark/sql/SQLInsertTestSuite.scala: ## @@ -122,6 +125,16 @@ trait SQLInsertTestSuite extends QueryTest with SQLTestUtils {

[GitHub] [spark] Hisoka-X opened a new pull request, #40908: [SPARK-42750] Support Insert By Name statement

2023-04-21 Thread via GitHub
Hisoka-X opened a new pull request, #40908: URL: https://github.com/apache/spark/pull/40908 ### What changes were proposed in this pull request? In some use cases, users have incoming dataframes with fixed column names which might differ from the canonical order. Currently

[GitHub] [spark] Hisoka-X commented on pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
Hisoka-X commented on PR #40865: URL: https://github.com/apache/spark/pull/40865#issuecomment-1518501790 kindly ping @cloud-fan . All CI passed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
cloud-fan commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1174277575 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method. Technically, a file

[GitHub] [spark] LuciferYang commented on a diff in pull request #40628: [SPARK-42999][Connect] Dataset#foreach, foreachPartition

2023-04-21 Thread via GitHub
LuciferYang commented on code in PR #40628: URL: https://github.com/apache/spark/pull/40628#discussion_r1174270420 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala: ## @@ -128,4 +130,72 @@ class

[GitHub] [spark] LuciferYang commented on pull request #40901: [SPARK-43195][BUILD][FOLLOWUP] Fix mima check for Scala 2.13

2023-04-21 Thread via GitHub
LuciferYang commented on PR #40901: URL: https://github.com/apache/spark/pull/40901#issuecomment-1518489585 thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] bogao007 commented on pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
bogao007 commented on PR #40834: URL: https://github.com/apache/spark/pull/40834#issuecomment-1518455976 > @bogao007 what's your JIRA id? I need to assign you in the JIRA ticket. I think this might be my JIRA id `62cbecffa94a6f9c0efe1622`, let me know if it doesn't work. -- This

[GitHub] [spark] HyukjinKwon closed pull request #40725: [SPARK-43082][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect

2023-04-21 Thread via GitHub
HyukjinKwon closed pull request #40725: [SPARK-43082][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect URL: https://github.com/apache/spark/pull/40725 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40725: [SPARK-43082][CONNECT][PYTHON] Arrow-optimized Python UDFs in Spark Connect

2023-04-21 Thread via GitHub
HyukjinKwon commented on PR #40725: URL: https://github.com/apache/spark/pull/40725#issuecomment-1518443157 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40901: [SPARK-43195][BUILD][FOLLOWUP] Fix mima check for Scala 2.13

2023-04-21 Thread via GitHub
HyukjinKwon closed pull request #40901: [SPARK-43195][BUILD][FOLLOWUP] Fix mima check for Scala 2.13 URL: https://github.com/apache/spark/pull/40901 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #40901: [SPARK-43195][BUILD][FOLLOWUP] Fix mima check for Scala 2.13

2023-04-21 Thread via GitHub
HyukjinKwon commented on PR #40901: URL: https://github.com/apache/spark/pull/40901#issuecomment-1518441760 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40907: [PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-21 Thread via GitHub
HyukjinKwon commented on code in PR #40907: URL: https://github.com/apache/spark/pull/40907#discussion_r1174235481 ## python/pyspark/sql/dataframe.py: ## @@ -3008,6 +3008,25 @@ def __getattr__(self, name: str) -> Column: jc = self._jdf.apply(name) return

[GitHub] [spark] HyukjinKwon commented on pull request #40907: [PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-21 Thread via GitHub
HyukjinKwon commented on PR #40907: URL: https://github.com/apache/spark/pull/40907#issuecomment-1518441484 Mind filing a JIRA please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
HyukjinKwon commented on PR #40834: URL: https://github.com/apache/spark/pull/40834#issuecomment-1518440835 @bogao007 what's your JIRA id? I need to assign you in the JIRA ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon closed pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
HyukjinKwon closed pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect URL: https://github.com/apache/spark/pull/40834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
HyukjinKwon commented on PR #40834: URL: https://github.com/apache/spark/pull/40834#issuecomment-1518440451 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #39312: [SPARK-41788][SQL] Move InsertIntoStatement to basicLogicalOperators

2023-04-21 Thread via GitHub
github-actions[bot] closed pull request #39312: [SPARK-41788][SQL] Move InsertIntoStatement to basicLogicalOperators URL: https://github.com/apache/spark/pull/39312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] github-actions[bot] commented on pull request #39481: [MINOR][SQL] Update the import order of scala package in class `SpecificParquetRecordReaderBase`

2023-04-21 Thread via GitHub
github-actions[bot] commented on PR #39481: URL: https://github.com/apache/spark/pull/39481#issuecomment-1518439502 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] wangyum closed pull request #40838: [SPARK-43174][SQL] Fix SparkSQLCLIDriver completer

2023-04-21 Thread via GitHub
wangyum closed pull request #40838: [SPARK-43174][SQL] Fix SparkSQLCLIDriver completer URL: https://github.com/apache/spark/pull/40838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] wangyum commented on pull request #40838: [SPARK-43174][SQL] Fix SparkSQLCLIDriver completer

2023-04-21 Thread via GitHub
wangyum commented on PR #40838: URL: https://github.com/apache/spark/pull/40838#issuecomment-1518438939 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] alexanderwu-db opened a new pull request, #40907: [PYTHON] Implement `__dir__()` in `pyspark.sql.dataframe.DataFrame` to include columns

2023-04-21 Thread via GitHub
alexanderwu-db opened a new pull request, #40907: URL: https://github.com/apache/spark/pull/40907 ### What changes were proposed in this pull request? Override the parent `__dir__()` method on Python `DataFrame` class to include column names. Main benefit of this is that any

[GitHub] [spark] WweiL commented on pull request #40906: [SPARK-43134] [CONNECT] [SS] JVM client StreamingQuery exception() API

2023-04-21 Thread via GitHub
WweiL commented on PR #40906: URL: https://github.com/apache/spark/pull/40906#issuecomment-1518400027 @rangadi @pengzhon-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] WweiL opened a new pull request, #40906: [SPARK-43134] [CONNECT] [SS] JVM client StreamingQuery exception() API

2023-04-21 Thread via GitHub
WweiL opened a new pull request, #40906: URL: https://github.com/apache/spark/pull/40906 ### What changes were proposed in this pull request? Add StreamingQuery exception() API for JVM client ### Why are the changes needed? Development of SS Connect ###

[GitHub] [spark] amaliujia commented on pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions

2023-04-21 Thread via GitHub
amaliujia commented on PR #40796: URL: https://github.com/apache/spark/pull/40796#issuecomment-1518390479 Overall looks reasonable to me. I only have questions over the proto validation in the server side. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] amaliujia commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions

2023-04-21 Thread via GitHub
amaliujia commented on code in PR #40796: URL: https://github.com/apache/spark/pull/40796#discussion_r1174198988 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -545,34 +540,94 @@ class SparkConnectPlanner(val

[GitHub] [spark] amaliujia commented on a diff in pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
amaliujia commented on code in PR #40834: URL: https://github.com/apache/spark/pull/40834#discussion_r1174195181 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -750,7 +751,8 @@ class SparkConnectPlanner(val

[GitHub] [spark] WweiL commented on pull request #40887: [SPARK-43144] Scala Client DataStreamReader table() API

2023-04-21 Thread via GitHub
WweiL commented on PR #40887: URL: https://github.com/apache/spark/pull/40887#issuecomment-1518384473 @HyukjinKwon Can you merge this when you get a chance? Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] anishshri-db commented on pull request #40905: [SPARK-43233] [SS] Add logging for Kafka Batch Reading for topic partition, offset range and task ID

2023-04-21 Thread via GitHub
anishshri-db commented on PR #40905: URL: https://github.com/apache/spark/pull/40905#issuecomment-1518359441 @HeartSaVioR - please take a look and merge after builds pass, thx ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] anishshri-db commented on pull request #40905: [SPARK-43233] [SS] Add logging for Kafka Batch Reading for topic partition, offset range and task ID

2023-04-21 Thread via GitHub
anishshri-db commented on PR #40905: URL: https://github.com/apache/spark/pull/40905#issuecomment-1518358413 @siying - you might need to enable github actions for the tests to run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] siying opened a new pull request, #40905: [SPARK-43233] [SS] Add logging for Kafka Batch Reading for topic partition, offset range and task ID

2023-04-21 Thread via GitHub
siying opened a new pull request, #40905: URL: https://github.com/apache/spark/pull/40905 ### What changes were proposed in this pull request? We add a logging when creating the batch reader with task ID, topic, partition and offset range included. The log line looks like following:

[GitHub] [spark] rangadi commented on a diff in pull request #40834: [SPARK-43046] [SS] [Connect] Implemented Python API dropDuplicatesWithinWatermark for Spark Connect

2023-04-21 Thread via GitHub
rangadi commented on code in PR #40834: URL: https://github.com/apache/spark/pull/40834#discussion_r1174175904 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -750,7 +751,8 @@ class SparkConnectPlanner(val

[GitHub] [spark] pengzhon-db opened a new pull request, #40904: [WIP][POC] foreachbatch spark connect

2023-04-21 Thread via GitHub
pengzhon-db opened a new pull request, #40904: URL: https://github.com/apache/spark/pull/40904 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] ueshin commented on a diff in pull request #40782: [SPARK-42669][CONNECT] Short circuit local relation RPCs

2023-04-21 Thread via GitHub
ueshin commented on code in PR #40782: URL: https://github.com/apache/spark/pull/40782#discussion_r1174158327 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkResult.scala: ## @@ -80,7 +80,10 @@ private[sql] class SparkResult[T](

[GitHub] [spark] sweisdb opened a new pull request, #40903: [WIP][SPARK-NNNNN] Updating AES-CBC support to not use OpenSSL's KDF

2023-04-21 Thread via GitHub
sweisdb opened a new pull request, #40903: URL: https://github.com/apache/spark/pull/40903 ### What changes were proposed in this pull request? The `aes_encrypt` support for CBC mode currently uses a key derivation function from OpenSSL's EVP_BytesToKey to generate an

[GitHub] [spark] rshkv commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1518291912 @cloud-fan, maybe let's consider multi-part attribute references as fine or at least separate from this? What do you think? I opened another PR just changing `DslAttr.attr` to not

[GitHub] [spark] xinrong-meng commented on pull request #40864: [WIP] Nested DataType compatibility in Arrow-optimized Python UDF and Pandas UDF

2023-04-21 Thread via GitHub
xinrong-meng commented on PR #40864: URL: https://github.com/apache/spark/pull/40864#issuecomment-1518288506 After double thoughts, we'd better not touch Pandas UDF to preserve backward compatibility. Let me close the PR and have a new prototype. -- This is an automated message from the

[GitHub] [spark] xinrong-meng closed pull request #40864: [WIP] Nested DataType compatibility in Arrow-optimized Python UDF and Pandas UDF

2023-04-21 Thread via GitHub
xinrong-meng closed pull request #40864: [WIP] Nested DataType compatibility in Arrow-optimized Python UDF and Pandas UDF URL: https://github.com/apache/spark/pull/40864 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] rshkv commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1518283716 Damn, thank you for reverting guys. Unsure why GA didn't test the last commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] sadikovi commented on pull request #40699: [SPARK-43063][SQL] `df.show` handle null should print NULL instead of null

2023-04-21 Thread via GitHub
sadikovi commented on PR #40699: URL: https://github.com/apache/spark/pull/40699#issuecomment-1518270247 To be honest, I don't understand why spark-sql shell is expected to be consistent with spark-shell or pyspark shell. Can someone elaborate? I can see making spark-sql shell consistent

[GitHub] [spark] woj-i commented on pull request #40821: [SPARK-43152][spark-structured-streaming] Parametrisable output metadata path (_spark_metadata)

2023-04-21 Thread via GitHub
woj-i commented on PR #40821: URL: https://github.com/apache/spark/pull/40821#issuecomment-1518209816 Surprisingly after commiting of naming improvements (no logic changes) the build failed. I think it's not related to my change. It happened at [Run / Build modules: streaming,

[GitHub] [spark] zhenlineo commented on a diff in pull request #40628: [SPARK-42999][Connect] Dataset#foreach, foreachPartition

2023-04-21 Thread via GitHub
zhenlineo commented on code in PR #40628: URL: https://github.com/apache/spark/pull/40628#discussion_r1174040639 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/UserDefinedFunctionE2ETestSuite.scala: ## @@ -128,4 +130,72 @@ class

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1174017427 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method.

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1174017427 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method.

[GitHub] [spark] ryan-johnson-databricks commented on pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on PR #40885: URL: https://github.com/apache/spark/pull/40885#issuecomment-1518150945 FYI the [tests that failed](https://github.com/ryan-johnson-databricks/spark/actions/runs/4765599580/jobs/8471553389) are broken upstream -- they also fail on the version

[GitHub] [spark] jchen5 commented on pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
jchen5 commented on PR #40865: URL: https://github.com/apache/spark/pull/40865#issuecomment-1518139762 I checked the case of `any_value(false)` in a debugger and it works because resultWithZeroTups is NULL there, so that explains why it works - because there's an aggregation value around

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1174017427 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method.

[GitHub] [spark] sunchao commented on pull request #40847: [SPARK-43185][BUILD] Inline `hadoop-client` related properties in `pom.xml`

2023-04-21 Thread via GitHub
sunchao commented on PR #40847: URL: https://github.com/apache/spark/pull/40847#issuecomment-1518134382 > So if there is an way to build and test Hadoop 3.0/3.1 successfully before this pr, but it loses after this pr, I think we should stop this work because Apache Spark has not previously

[GitHub] [spark] sunchao commented on pull request #40900: [SPARK-43196][YARN][FOLLOWUP] Remove unnecessary Hadoop version check

2023-04-21 Thread via GitHub
sunchao commented on PR #40900: URL: https://github.com/apache/spark/pull/40900#issuecomment-1518132719 Merged to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] sunchao closed pull request #40900: [SPARK-43196][YARN][FOLLOWUP] Remove unnecessary Hadoop version check

2023-04-21 Thread via GitHub
sunchao closed pull request #40900: [SPARK-43196][YARN][FOLLOWUP] Remove unnecessary Hadoop version check URL: https://github.com/apache/spark/pull/40900 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm closed pull request #40843: [SPARK-43179][SHUFFLE] Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service

2023-04-21 Thread via GitHub
mridulm closed pull request #40843: [SPARK-43179][SHUFFLE] Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service URL: https://github.com/apache/spark/pull/40843 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] mridulm commented on pull request #40843: [SPARK-43179][SHUFFLE] Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service

2023-04-21 Thread via GitHub
mridulm commented on PR #40843: URL: https://github.com/apache/spark/pull/40843#issuecomment-1518125522 Thanks for fixing this @otterc ! Thanks for the reviews @tgravescs, @zhouyejoe :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] WweiL commented on pull request #40887: [SPARK-43144] Scala Client DataStreamReader table() API

2023-04-21 Thread via GitHub
WweiL commented on PR #40887: URL: https://github.com/apache/spark/pull/40887#issuecomment-1518097080 > You probably also need to generate the golden file for `ProtoToParsedPlanTestSuite`. There is instructions documented in that suite. Ah I see there is also a bin file. I did run

[GitHub] [spark] LuciferYang commented on pull request #40901: [SPARK-43195][BUILD][FOLLOWUP] Fix mima check for Scala 2.13

2023-04-21 Thread via GitHub
LuciferYang commented on PR #40901: URL: https://github.com/apache/spark/pull/40901#issuecomment-1518017291 cc @pan3793 @sunchao @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #40901: [SPARK-43195][FOLLOWUP] Fix mima check for Scala 2.13

2023-04-21 Thread via GitHub
LuciferYang opened a new pull request, #40901: URL: https://github.com/apache/spark/pull/40901 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] srowen commented on pull request #40893: [SPARK-43225][BUILD][SQL] Remove jackson-core-asl and jackson-mapper-asl from pre-built distribution

2023-04-21 Thread via GitHub
srowen commented on PR #40893: URL: https://github.com/apache/spark/pull/40893#issuecomment-1517997626 Is this possible now that Hadoop 2 support is gone? just checking what the implications of this change are. Are the Hive.get changes needed, or can we batch those changes with

[GitHub] [spark] LuciferYang commented on pull request #40675: [SPARK-42657][CONNECT] Support to find and transfer client-side REPL classfiles to server as artifacts

2023-04-21 Thread via GitHub
LuciferYang commented on PR #40675: URL: https://github.com/apache/spark/pull/40675#issuecomment-1517995794 @vicennial I found `ReplE2ESuite` always failed in Java 17 GA daily test: - https://github.com/apache/spark/actions/runs/4726264540/jobs/8385681548 -

[GitHub] [spark] cloud-fan commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
cloud-fan commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1517960380 let's revert first. Seems GA wrongly reported green for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] wangyum commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
wangyum commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1517960382 Reverted: https://github.com/apache/spark/commit/3523d83ac472b330bb86a442365c0a15f7e53f8c. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] huaxingao closed pull request #40889: [SPARK-41660][SQL][3.3] Only propagate metadata columns if they are used

2023-04-21 Thread via GitHub
huaxingao closed pull request #40889: [SPARK-41660][SQL][3.3] Only propagate metadata columns if they are used URL: https://github.com/apache/spark/pull/40889 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] huaxingao commented on pull request #40889: [SPARK-41660][SQL][3.3] Only propagate metadata columns if they are used

2023-04-21 Thread via GitHub
huaxingao commented on PR #40889: URL: https://github.com/apache/spark/pull/40889#issuecomment-1517948030 Merged to branch-3.3. Thank you all for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
LuciferYang commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1517927714 > https://github.com/apache/spark/actions/runs/4765094614/jobs/8470442826 > >

[GitHub] [spark] LuciferYang commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
LuciferYang commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1517926432 https://github.com/apache/spark/actions/runs/4765094614/jobs/8470442826 https://user-images.githubusercontent.com/1475305/233662686-1bfb0633-bbd6-4c4a-a9b9-ecdd8e2f0ffc.png;>

[GitHub] [spark] johanl-db commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
johanl-db commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173803115 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method. Technically, a file

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173799650 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method.

[GitHub] [spark] wangyum commented on pull request #40616: [SPARK-42991][SQL] Disable string type +/- interval in ANSI mode

2023-04-21 Thread via GitHub
wangyum commented on PR #40616: URL: https://github.com/apache/spark/pull/40616#issuecomment-1517869092 @gengliangwang Has updated the description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] pan3793 commented on pull request #40900: [SPARK-43196][YARN][FOLLOWUP] Remove unnecessary Hadoop version check

2023-04-21 Thread via GitHub
pan3793 commented on PR #40900: URL: https://github.com/apache/spark/pull/40900#issuecomment-1517850737 @sunchao @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 opened a new pull request, #40900: [SPARK-43196][YARN][FOLLOWUP] Remove unnecessary Hadoop version check

2023-04-21 Thread via GitHub
pan3793 opened a new pull request, #40900: URL: https://github.com/apache/spark/pull/40900 ### What changes were proposed in this pull request? It's not necessary to check Hadoop version 2.9+ or 3.0+ now. ### Why are the changes needed? Simplify code and docs.

[GitHub] [spark] tgravescs commented on pull request #40843: [SPARK-43179][SHUFFLE] Allowing apps to control whether their metadata gets saved in the db by the External Shuffle Service

2023-04-21 Thread via GitHub
tgravescs commented on PR #40843: URL: https://github.com/apache/spark/pull/40843#issuecomment-1517841735 lgtm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
Hisoka-X commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173768209 ## sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-count-bug.sql.out: ## @@ -86,14 +86,14 @@ from l -- !query schema struct

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173755343 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -241,47 +256,74 @@ object FileFormat {

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173759666 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -241,47 +256,74 @@ object FileFormat {

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173758526 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -203,6 +203,21 @@ trait FileFormat { * method.

[GitHub] [spark] jchen5 commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
jchen5 commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173756531 ## sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-count-bug.sql.out: ## @@ -86,14 +86,14 @@ from l -- !query schema struct --

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40885: [SPARK-43226] Define extractors for file-constant metadata

2023-04-21 Thread via GitHub
ryan-johnson-databricks commented on code in PR #40885: URL: https://github.com/apache/spark/pull/40885#discussion_r1173755343 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -241,47 +256,74 @@ object FileFormat {

[GitHub] [spark] ted-jenks commented on pull request #39907: [SPARK-42359][SQL] Support row skipping when reading CSV files

2023-04-21 Thread via GitHub
ted-jenks commented on PR #39907: URL: https://github.com/apache/spark/pull/39907#issuecomment-1517806003 @HyukjinKwon I have done more work on this, please let me know what you think! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] juliuszsompolski commented on pull request #40899: [MINOR][CONNECT] Fix missing stats for SQL Command

2023-04-21 Thread via GitHub
juliuszsompolski commented on PR #40899: URL: https://github.com/apache/spark/pull/40899#issuecomment-1517798535 The original PR was merged to 3.4, so this bufgix should also go to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] advancedxy commented on pull request #37417: [SPARK-33782][K8S][CORE]Place spark.files, spark.jars and spark.files under the current working directory on the driver in K8S cluster mod

2023-04-21 Thread via GitHub
advancedxy commented on PR #37417: URL: https://github.com/apache/spark/pull/37417#issuecomment-1517797912 @pralabhkumar thanks for your work. I noticed similar issue when running spark application on K8S, it's helpful feature However, this pr might have some inefficiency to download

[GitHub] [spark] peter-toth commented on pull request #40266: [SPARK-42660][SQL] Infer filters for Join produced by IN and EXISTS clause (RewritePredicateSubquery rule)

2023-04-21 Thread via GitHub
peter-toth commented on PR #40266: URL: https://github.com/apache/spark/pull/40266#issuecomment-1517796434 @mskapilks, do you have any update on this? I can to take over this PR and investigate the idea further if you don't have time for it. -- This is an automated message from the

[GitHub] [spark] cloud-fan closed pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
cloud-fan closed pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters URL: https://github.com/apache/spark/pull/40794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan commented on pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
cloud-fan commented on PR #40794: URL: https://github.com/apache/spark/pull/40794#issuecomment-1517793940 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
cloud-fan commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173730596 ## sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-count-bug.sql.out: ## @@ -86,14 +86,14 @@ from l -- !query schema struct

[GitHub] [spark] grundprinzip opened a new pull request, #40899: [MINOR][CONNECT] Fix missing stats for SQL Command

2023-04-21 Thread via GitHub
grundprinzip opened a new pull request, #40899: URL: https://github.com/apache/spark/pull/40899 ### What changes were proposed in this pull request? This patch fixes a minor issue in the code where for SQL Commands the plan metrics are not sent to the client. In addition, it renames

[GitHub] [spark] grundprinzip commented on a diff in pull request #40160: [SPARK-41725][CONNECT] Eager Execution of DF.sql()

2023-04-21 Thread via GitHub
grundprinzip commented on code in PR #40160: URL: https://github.com/apache/spark/pull/40160#discussion_r1173683435 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1450,10 +1458,79 @@ class

[GitHub] [spark] wangyum commented on pull request #40897: [SPARK-43228][SQL] Join keys also match PartitioningCollection in CoalesceBucketsInJoin

2023-04-21 Thread via GitHub
wangyum commented on PR #40897: URL: https://github.com/apache/spark/pull/40897#issuecomment-1517707460 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] rshkv commented on a diff in pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on code in PR #40794: URL: https://github.com/apache/spark/pull/40794#discussion_r1173669012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala: ## @@ -271,7 +271,7 @@ package object dsl { override def expr: Expression =

[GitHub] [spark] rshkv commented on a diff in pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on code in PR #40794: URL: https://github.com/apache/spark/pull/40794#discussion_r1173675183 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala: ## @@ -271,14 +271,17 @@ package object dsl { override def expr: Expression =

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
Hisoka-X commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173671566 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -599,10 +600,32 @@ object RewriteCorrelatedScalarSubquery extends

[GitHub] [spark] jchen5 commented on pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
jchen5 commented on PR #40865: URL: https://github.com/apache/spark/pull/40865#issuecomment-1517695970 > Depends on what is correct results of select *, (select any_value(false) as result from t1 where t0.a = t1.c) from t0) ? Yes, this should return null on empty data. I will

[GitHub] [spark] rshkv commented on a diff in pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on code in PR #40794: URL: https://github.com/apache/spark/pull/40794#discussion_r1173669012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala: ## @@ -271,7 +271,7 @@ package object dsl { override def expr: Expression =

[GitHub] [spark] jchen5 commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
jchen5 commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173668725 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -599,10 +600,32 @@ object RewriteCorrelatedScalarSubquery extends

[GitHub] [spark] rshkv commented on a diff in pull request #40794: [SPARK-43142] Fix DSL expressions on attributes with special characters

2023-04-21 Thread via GitHub
rshkv commented on code in PR #40794: URL: https://github.com/apache/spark/pull/40794#discussion_r1173564024 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala: ## @@ -271,7 +271,7 @@ package object dsl { override def expr: Expression =

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40865: [SPARK-43156][SQL] Fix `COUNT(*) is null` bug in correlated scalar subquery

2023-04-21 Thread via GitHub
Hisoka-X commented on code in PR #40865: URL: https://github.com/apache/spark/pull/40865#discussion_r1173667469 ## sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-count-bug.sql.out: ## @@ -106,14 +106,14 @@ from l -- !query schema struct

[GitHub] [spark] bjornjorgensen commented on pull request #40878: [SPARK-42780][BUILD] Upgrade `Tink` to 1.9.0

2023-04-21 Thread via GitHub
bjornjorgensen commented on PR #40878: URL: https://github.com/apache/spark/pull/40878#issuecomment-1517673103 @LuciferYang Thank you  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] NarekDW commented on pull request #39719: [SPARK-42169] [SQL] Implement code generation for to_csv function (StructsToCsv)

2023-04-21 Thread via GitHub
NarekDW commented on PR #39719: URL: https://github.com/apache/spark/pull/39719#issuecomment-1517635600 @jaceklaskowski thank you for the review. @MaxGekk just a reminder. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] NarekDW commented on a diff in pull request #39719: [SPARK-42169] [SQL] Implement code generation for to_csv function (StructsToCsv)

2023-04-21 Thread via GitHub
NarekDW commented on code in PR #39719: URL: https://github.com/apache/spark/pull/39719#discussion_r1173620399 ## sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala: ## @@ -574,4 +575,11 @@ class CsvFunctionsSuite extends QueryTest with SharedSparkSession {

[GitHub] [spark] LuciferYang commented on a diff in pull request #40892: [SPARK-43128][CONNECT] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scal

2023-04-21 Thread via GitHub
LuciferYang commented on code in PR #40892: URL: https://github.com/apache/spark/pull/40892#discussion_r1173602327 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/streaming/progress.scala: ## @@ -17,6 +17,297 @@ package org.apache.spark.sql.streaming

[GitHub] [spark] LuciferYang commented on a diff in pull request #40892: [SPARK-43128][CONNECT] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scal

2023-04-21 Thread via GitHub
LuciferYang commented on code in PR #40892: URL: https://github.com/apache/spark/pull/40892#discussion_r1173598454 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2182,7 +2182,7 @@ class SparkConnectPlanner(val

[GitHub] [spark] peter-toth commented on a diff in pull request #40856: [SPARK-43199][SQL] Make InlineCTE idempotent

2023-04-21 Thread via GitHub
peter-toth commented on code in PR #40856: URL: https://github.com/apache/spark/pull/40856#discussion_r1173587974 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InlineCTE.scala: ## @@ -68,50 +69,91 @@ case class InlineCTE(alwaysInline: Boolean = false)

  1   2   >