[GitHub] [spark] HeartSaVioR commented on pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scala Api

2023-05-17 Thread via GitHub
HeartSaVioR commented on PR #40892: URL: https://github.com/apache/spark/pull/40892#issuecomment-1550888639 I'll just help merging this one as it has been here for multiple weeks and we don't want to require this PR to be rebased anymore. -- This is an automated message from the Apache

[GitHub] [spark] HeartSaVioR closed pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scala Api

2023-05-17 Thread via GitHub
HeartSaVioR closed pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scala Api URL: https://github.com/apache/spark/pull/40892 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HeartSaVioR commented on pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scala Api

2023-05-17 Thread via GitHub
HeartSaVioR commented on PR #40892: URL: https://github.com/apache/spark/pull/40892#issuecomment-1550890958 Thanks @LuciferYang , I merged this to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] olaky commented on pull request #39608: [SPARK-43450][SQL][TESTS][FOLLOWUP] Additional tests for _metadata filters

2023-05-17 Thread via GitHub
olaky commented on PR #39608: URL: https://github.com/apache/spark/pull/39608#issuecomment-1550895739 @dongjoon-hyun I created the ticket and made the requested changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] LuciferYang commented on pull request #40892: [SPARK-43128][CONNECT][SS] Make `recentProgress` and `lastProgress` return `StreamingQueryProgress` consistent with the native Scala Api

2023-05-17 Thread via GitHub
LuciferYang commented on PR #40892: URL: https://github.com/apache/spark/pull/40892#issuecomment-1550894768 Thanks @HeartSaVioR @HyukjinKwon @rangadi ~ I have already tested my new permissions in other pr :) -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] LuciferYang commented on pull request #40654: [SPARK-43022][CONNECT] Support protobuf functions for Scala client

2023-05-17 Thread via GitHub
LuciferYang commented on PR #40654: URL: https://github.com/apache/spark/pull/40654#issuecomment-1550897944 hmm... `SparkConnectPlanner ` has had another conflict... I will fix it and merge this pr as soon as possible -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] wangyum opened a new pull request, #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-17 Thread via GitHub
wangyum opened a new pull request, #41195: URL: https://github.com/apache/spark/pull/41195 ### What changes were proposed in this pull request? This PR adds `log4j-1.2-api` and `log4j-slf4j2-impl` to classpath if active `hadoop-provided`. ### Why are the changes needed?

[GitHub] [spark] MaxGekk commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
MaxGekk commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196089093 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] panbingkun commented on pull request #41184: [SPARK-43535][BUILD] Adjust the ImportOrderChecker rule to resolve long-standing import order issues

2023-05-17 Thread via GitHub
panbingkun commented on PR #41184: URL: https://github.com/apache/spark/pull/41184#issuecomment-1550941249 Through the debug plugin scalastyle, I found that the logic of the ImportOrderChecker rule is as follows, for example, in the xxx file, the following import:

[GitHub] [spark] panbingkun commented on pull request #41184: [SPARK-43535][BUILD] Adjust the ImportOrderChecker rule to resolve long-standing import order issues

2023-05-17 Thread via GitHub
panbingkun commented on PR #41184: URL: https://github.com/apache/spark/pull/41184#issuecomment-1550945373 friendly ping @HyukjinKwon @dongjoon-hyun @srowen @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] panbingkun commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-17 Thread via GitHub
panbingkun commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1196113962 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2134,30 +2134,145 @@ case class OctetLength(child:

[GitHub] [spark] turboFei commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-17 Thread via GitHub
turboFei commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1550791604 thanks for comments, I will check it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] advancedxy commented on pull request #41181: [SPARK-43504][K8S] Mounts the hadoop config map on the executor pod

2023-05-17 Thread via GitHub
advancedxy commented on PR #41181: URL: https://github.com/apache/spark/pull/41181#issuecomment-1550797386 > Thank you for making a PR, @turboFei . > > However, this PR might cause a outage because the number of configMap is controlled by quota. > > ``` > $ kubectl describe

[GitHub] [spark] HyukjinKwon closed pull request #41026: [SPARK-43132] [SS] [CONNECT] Python Client DataStreamWriter foreach() API

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41026: [SPARK-43132] [SS] [CONNECT] Python Client DataStreamWriter foreach() API URL: https://github.com/apache/spark/pull/41026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-17 Thread via GitHub
MaxGekk commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1550789731 > This change adds support for optional IV and AAD fields to aes_encrypt and aes_decrypt @sweisdb Looking at the constructors of the `AesEncrypt` and `AesDecrypt` expressions,

[GitHub] [spark] MaxGekk commented on pull request #41155: [SPARK-43487][SQL] Fix Nested CTE error message

2023-05-17 Thread via GitHub
MaxGekk commented on PR #41155: URL: https://github.com/apache/spark/pull/41155#issuecomment-1550794232 @johanl-db Are you working on the PR? Could you, please, address the comments above. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] MaxGekk commented on a diff in pull request #41020: [SPARK-43345][SPARK-43346][SQL] Rename the error classes _LEGACY_ERROR_TEMP_[0041|1206]

2023-05-17 Thread via GitHub
MaxGekk commented on code in PR #41020: URL: https://github.com/apache/spark/pull/41020#discussion_r1195983690 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2115,7 +2115,7 @@ private[sql] object QueryCompilationErrors extends

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
SandishKumarHN commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1196002786 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -148,8 +212,38 @@ object functions { messageName: String,

[GitHub] [spark] gatorsmile commented on a diff in pull request #40658: [WIP][SPARK-43024][PS] Upgrade pandas to 2.0.0

2023-05-17 Thread via GitHub
gatorsmile commented on code in PR #40658: URL: https://github.com/apache/spark/pull/40658#discussion_r1196006534 ## python/pyspark/pandas/frame.py: ## @@ -8844,91 +8836,6 @@ def combine_first(self, other: "DataFrame") -> "DataFrame": ) return

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array

2023-05-17 Thread via GitHub
Hisoka-X commented on code in PR #41187: URL: https://github.com/apache/spark/pull/41187#discussion_r1196009514 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala: ## @@ -371,7 +371,8 @@ object CreateStruct { // We should

[GitHub] [spark] Hisoka-X commented on a diff in pull request #41187: [SPARK-43522][SQL] Fix creating struct column name with index of array

2023-05-17 Thread via GitHub
Hisoka-X commented on code in PR #41187: URL: https://github.com/apache/spark/pull/41187#discussion_r1196009862 ## sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala: ## @@ -280,6 +280,20 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with

[GitHub] [spark] LuciferYang commented on pull request #40925: [SPARK-43246][BUILD] Ignore `privateClasses` and `privateMembers` from connect mima check as default

2023-05-17 Thread via GitHub
LuciferYang commented on PR #40925: URL: https://github.com/apache/spark/pull/40925#issuecomment-1550820753 ``` Error: Exception in thread "main" java.lang.IllegalArgumentException: Unsupported class file major version 61 at org.objectweb.asm.ClassReader.(ClassReader.java:195)

[GitHub] [spark] HyukjinKwon commented on pull request #41026: [SPARK-43132] [SS] [CONNECT] Python Client DataStreamWriter foreach() API

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41026: URL: https://github.com/apache/spark/pull/41026#issuecomment-1550823334 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #41194: [SPARK-43532][BUILD][TESTS] Upgrade `jdbc` related test dependencies

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41194: URL: https://github.com/apache/spark/pull/41194#issuecomment-1550822915 late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
SandishKumarHN commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1196002786 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -148,8 +212,38 @@ object functions { messageName: String,

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
SandishKumarHN commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1196003247 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -148,8 +212,38 @@ object functions { messageName: String,

[GitHub] [spark] MaxGekk commented on a diff in pull request #41169: [SPARK-43493][SQL] Add a max distance argument to the levenshtein() function

2023-05-17 Thread via GitHub
MaxGekk commented on code in PR #41169: URL: https://github.com/apache/spark/pull/41169#discussion_r1196017647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2134,30 +2134,145 @@ case class OctetLength(child: Expression)

[GitHub] [spark] HyukjinKwon commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1550847776 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions URL: https://github.com/apache/spark/pull/41013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1551615074 cc @pralabhkumar and @holdenk from #37417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dtenedor commented on a diff in pull request #40996: [SPARK-43313][SQL] Adding missing column DEFAULT values for MERGE INSERT actions

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #40996: URL: https://github.com/apache/spark/pull/40996#discussion_r1196822211 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] zhenlineo commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
zhenlineo commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1196867820 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -216,6 +216,7 @@ message WriteStreamOperationStart { message

[GitHub] [spark] ericm-db opened a new pull request, #41205: [WIP] [SC-130782] Define a new error class and apply for the case where streaming query fails due to concurrent run of streaming query with

2023-05-17 Thread via GitHub
ericm-db opened a new pull request, #41205: URL: https://github.com/apache/spark/pull/41205 ### What changes were proposed in this pull request? We are migrating to a new error framework in order to surface errors in a friendlier way to customers. This PR defines a new error

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general constant expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1197029651 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1196820360 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -27,12 +28,12 @@ * @since 3.4.1 */ @Evolving -public

[GitHub] [spark] dtenedor commented on a diff in pull request #41062: [SPARK-43313][SQL][FOLLOWUP] Improvement for DSv2 API SupportsCustomSchemaWrite

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41062: URL: https://github.com/apache/spark/pull/41062#discussion_r1196819921 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/write/SupportsCustomSchemaWrite.java: ## @@ -27,12 +28,12 @@ * @since 3.4.1 */ @Evolving -public

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1196819312 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/foreachWriterPacket.scala: ## @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] srowen commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
srowen commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551953229 OK yeah it was fine, false alarm. Oops. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551729535 No worry, @srowen ~ I'll monitor together. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196850103 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [WIP] Standardize nested non-atomic input type support in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196805611 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551745541 @RyanBerti thanks for the update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL][FOLLOWUP] Show a directional message in ListQuery nullability assertion

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196870894 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] MaxGekk commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
MaxGekk commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196979589 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] RyanBerti commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551953337 @dtenedor I just pushed a commit that tries to generalize the foldable check, as I'm seeing duplicate code in the datasketches functions as well as others (see

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196991836 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3189,37 @@ class AstBuilder extends

[GitHub] [spark] dtenedor commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1196808900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -265,6 +288,26 @@ case class HllUnionAgg(

[GitHub] [spark] holdenk commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
holdenk commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1551827663 +1 looks reasonable module the existing suggestions (clean up the logging + tighten the test). Thanks for making this PR :) -- This is an automated message from the Apache Git

[GitHub] [spark] dtenedor commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
dtenedor commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196972501 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] jchen5 opened a new pull request, #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
jchen5 opened a new pull request, #41202: URL: https://github.com/apache/spark/pull/41202 ### What changes were proposed in this pull request? In case the assert for the call to ListQuery.nullable is hit, mention in the assert error message the conf flag that can be used to disable the

[GitHub] [spark] jchen5 commented on a diff in pull request #41094: [SPARK-43413][SQL] Fix IN subquery ListQuery nullability

2023-05-17 Thread via GitHub
jchen5 commented on code in PR #41094: URL: https://github.com/apache/spark/pull/41094#discussion_r1196750077 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4199,6 +4199,16 @@ object SQLConf { .booleanConf

[GitHub] [spark] RyanBerti commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551726374 @bersprockets here are the changes to handle non-foldable input args, based on our conversation in https://github.com/apache/spark/pull/40615. cc @dtenedor @mkaravel -- This is an

[GitHub] [spark] RyanBerti commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
RyanBerti commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1196857575 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/datasketchesAggregates.scala: ## @@ -265,6 +288,26 @@ case class HllUnionAgg(

[GitHub] [spark] jchen5 commented on pull request #41202: [SPARK-43413][SQL][FOLLOWUP] Show a directional message in ListQuery nullability assertion

2023-05-17 Thread via GitHub
jchen5 commented on PR #41202: URL: https://github.com/apache/spark/pull/41202#issuecomment-1551806149 Thanks for comments, updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] gengliangwang commented on a diff in pull request #41191: [SPARK-43529][SQL] Support general expressions as OPTIONS values in the parser

2023-05-17 Thread via GitHub
gengliangwang commented on code in PR #41191: URL: https://github.com/apache/spark/pull/41191#discussion_r1196885982 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala: ## @@ -3187,7 +3187,24 @@ class AstBuilder extends

[GitHub] [spark] LuciferYang commented on pull request #41198: [SPARK-43537][INFA][BUILD] Upgrading the ASM dependencies used in the `tools` module to 9.4

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41198: URL: https://github.com/apache/spark/pull/41198#issuecomment-1551749676 If there are any issues, please revert and I will resubmit one :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] MaxGekk opened a new pull request, #41204: [WIP][SQL] Fix resolving of `Filter` output

2023-05-17 Thread via GitHub
MaxGekk opened a new pull request, #41204: URL: https://github.com/apache/spark/pull/41204 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] sweisdb commented on pull request #40970: [SPARK-43290][SQL] Adds IV and AAD support to aes_encrypt/aes_decrypt

2023-05-17 Thread via GitHub
sweisdb commented on PR #40970: URL: https://github.com/apache/spark/pull/40970#issuecomment-1551795385 @MaxGekk I am planning to doing the user-facing SQL expression changes in a followup to make each change more simple. I want to land this first. -- This is an automated message from

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41202: [SPARK-43413][SQL] Mention flag in assert error message for ListQuery nullable

2023-05-17 Thread via GitHub
dongjoon-hyun commented on code in PR #41202: URL: https://github.com/apache/spark/pull/41202#discussion_r1196848013 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala: ## @@ -372,7 +372,8 @@ case class ListQuery( // ListQuery can't be

[GitHub] [spark] dtenedor commented on pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
dtenedor commented on PR #41203: URL: https://github.com/apache/spark/pull/41203#issuecomment-1551964588 The new trait looks good. In the future we can think about reusing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun closed pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1

2023-05-17 Thread via GitHub
dongjoon-hyun closed pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1 URL: https://github.com/apache/spark/pull/41122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #41122: [SPARK-43436][BUILD] Upgrade rocksdbjni to 8.1.1.1

2023-05-17 Thread via GitHub
dongjoon-hyun commented on PR #41122: URL: https://github.com/apache/spark/pull/41122#issuecomment-1552036061 Merged to master for Apache Spark 3.5.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] otterc commented on a diff in pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-17 Thread via GitHub
otterc commented on code in PR #41071: URL: https://github.com/apache/spark/pull/41071#discussion_r1197131313 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java: ## @@ -163,14 +163,11 @@ public void

[GitHub] [spark] HyukjinKwon opened a new pull request, #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon opened a new pull request, #41206: URL: https://github.com/apache/spark/pull/41206 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/41013 that sets `SPARK_CONNECT_MODE_ENABLED` when running PySpark shell

[GitHub] [spark] HyukjinKwon commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552233690 https://github.com/apache/spark/pull/41206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rangadi commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1197260139 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/functions.scala: ## @@ -148,8 +212,38 @@ object functions { messageName: String,

[GitHub] [spark] grundprinzip commented on pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
grundprinzip commented on PR #41206: URL: https://github.com/apache/spark/pull/41206#issuecomment-1552363238 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] robreeves commented on pull request #40812: [SPARK-43157][SQL] Clone InMemoryRelation cached plan to prevent cloned plan from referencing same objects

2023-05-17 Thread via GitHub
robreeves commented on PR #40812: URL: https://github.com/apache/spark/pull/40812#issuecomment-1552124399 > @cloud-fan Cloning the cachedPlan is also problematic because it contains state (accumulators in private fields) when it includes a `CollectMetricsExec` operator.

[GitHub] [spark] warrenzhu25 commented on pull request #41083: [SPARK-43399][CORE] Add config to control threshold of unregister map ouput when fetch failed

2023-05-17 Thread via GitHub
warrenzhu25 commented on PR #41083: URL: https://github.com/apache/spark/pull/41083#issuecomment-1552141121 > These looks like things which can be handled by appropriate configuration tuning ? The PR itself requires a bit more work if that is not a feasible direction (efficient cleanup,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon commented on code in PR #41206: URL: https://github.com/apache/spark/pull/41206#discussion_r1197200251 ## python/pyspark/shell.py: ## @@ -100,10 +100,9 @@ % (platform.python_version(), platform.python_build()[0], platform.python_build()[1]) ) if is_remote():

[GitHub] [spark] Kimahriman commented on pull request #41195: [SPARK-43534][BUILD] Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-17 Thread via GitHub
Kimahriman commented on PR #41195: URL: https://github.com/apache/spark/pull/41195#issuecomment-1552245794 Maybe similar reason I made https://github.com/apache/spark/pull/37694 a while ago? Basically Spark logging setup assumes log4j2, but with hadoop provided you get 1.x from Hadoop. So

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41206: [SPARK-43509][PYTHON][CONNECT][FOLLOW-UP] Set SPARK_CONNECT_MODE_ENABLED when running pyspark shell with remote is local

2023-05-17 Thread via GitHub
HyukjinKwon commented on code in PR #41206: URL: https://github.com/apache/spark/pull/41206#discussion_r1197216790 ## python/pyspark/shell.py: ## @@ -100,10 +100,11 @@ % (platform.python_version(), platform.python_build()[0], platform.python_build()[1]) ) if

[GitHub] [spark] wzhfy commented on pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
wzhfy commented on PR #41162: URL: https://github.com/apache/spark/pull/41162#issuecomment-1552291841 I also think that the different results between 0 in ('00') and 0 = '00' are confusing, and seems hive already fixed this problem. Could you also take a look? @cloud-fan @MaxGekk --

[GitHub] [spark] itholic opened a new pull request, #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
itholic opened a new pull request, #41208: URL: https://github.com/apache/spark/pull/41208 ### What changes were proposed in this pull request? This PR proposes to fix [Supported pandas

[GitHub] [spark] gerashegalov commented on a diff in pull request #41203: [SPARK-16484][SQL] Update hll function type checks to also check for non-foldable inputs

2023-05-17 Thread via GitHub
gerashegalov commented on code in PR #41203: URL: https://github.com/apache/spark/pull/41203#discussion_r1197205234 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ExpectsInputTypes.scala: ## @@ -74,3 +74,44 @@ object ExpectsInputTypes extends

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197261326 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] LuciferYang commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
LuciferYang commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552334558 cc @attilapiros @viirya @sunchao @pan3793 FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] panbingkun commented on pull request #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
panbingkun commented on PR #41209: URL: https://github.com/apache/spark/pull/41209#issuecomment-1552341297 https://github.com/apache/spark/assets/15246973/6da74b5d-4e71-440e-bb47-d17ba7f7de1e;> -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] pan3793 commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pan3793 commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1196873001 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

[GitHub] [spark] rangadi commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-05-17 Thread via GitHub
rangadi commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1197276153 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2386,10 +2393,26 @@ class SparkConnectPlanner(val

[GitHub] [spark] wzhfy commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
wzhfy commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197253055 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] liukuijian8040 commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
liukuijian8040 commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197258991 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] liukuijian8040 commented on a diff in pull request #41162: [SPARK-43491][SQL] In expression should act as same as EqualTo when elements in IN expression have same DataType.

2023-05-17 Thread via GitHub
liukuijian8040 commented on code in PR #41162: URL: https://github.com/apache/spark/pull/41162#discussion_r1197258991 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -509,16 +509,25 @@ case class In(value: Expression, list:

[GitHub] [spark] ueshin commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
ueshin commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552174756 Hi, `./bin/pyspark --remote local` shows the following error after this commit. ```py % ./bin/pyspark --remote local ... Traceback (most recent call last): File

[GitHub] [spark] HyukjinKwon commented on pull request #41013: [SPARK-43509][CONNECT] Support Creating multiple Spark Connect sessions

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41013: URL: https://github.com/apache/spark/pull/41013#issuecomment-1552230867 creating a followup now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] itholic opened a new pull request, #41207: [SPARK-42826][FOLLOWUP][PS][DOCS] Update migration notes for pandas API on Spark.

2023-05-17 Thread via GitHub
itholic opened a new pull request, #41207: URL: https://github.com/apache/spark/pull/41207 ### What changes were proposed in this pull request? This is follow-up for https://github.com/apache/spark/pull/40459 to fix the incorrect information and to elaborate more detailed changes.

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41147: [SPARK-43543][PYTHON] Fix nested MapType behavior in Pandas UDF

2023-05-17 Thread via GitHub
xinrong-meng commented on code in PR #41147: URL: https://github.com/apache/spark/pull/41147#discussion_r1196924307 ## python/pyspark/sql/pandas/serializers.py: ## @@ -317,66 +320,6 @@ def arrow_to_pandas(self, arrow_column): s =

[GitHub] [spark] LuciferYang commented on pull request #40654: [SPARK-43022][CONNECT] Support protobuf functions for Scala client

2023-05-17 Thread via GitHub
LuciferYang commented on PR #40654: URL: https://github.com/apache/spark/pull/40654#issuecomment-1552308089 Merged to master. Thanks @hvanhovell @HyukjinKwon @rangadi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] advancedxy commented on a diff in pull request #41192: [SPARK-43530][PROTOBUF] Read descriptor file only once

2023-05-17 Thread via GitHub
advancedxy commented on code in PR #41192: URL: https://github.com/apache/spark/pull/41192#discussion_r1197287073 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/CatalystDataToProtobuf.scala: ## @@ -26,14 +26,14 @@ import

[GitHub] [spark] panbingkun opened a new pull request, #41209: [SPARK-43548][SS] Remove workaround for HADOOP-16255

2023-05-17 Thread via GitHub
panbingkun opened a new pull request, #41209: URL: https://github.com/apache/spark/pull/41209 ### What changes were proposed in this pull request? The pr aims to remove workaround for HADOOP-16255. ### Why are the changes needed? - Because HADOOP-16255 has been fix after hadoop

[GitHub] [spark] pralabhkumar commented on pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
pralabhkumar commented on PR #41201: URL: https://github.com/apache/spark/pull/41201#issuecomment-1552336382 LGTM . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon closed pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
HyukjinKwon closed pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs URL: https://github.com/apache/spark/pull/41208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #41208: [3.4][SPARK-43547][PS][DOCS] Update "Supported Pandas API" page to point out the proper pandas docs

2023-05-17 Thread via GitHub
HyukjinKwon commented on PR #41208: URL: https://github.com/apache/spark/pull/41208#issuecomment-1552340648 Merged to branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] warrenzhu25 commented on a diff in pull request #41071: [SPARK-43391][CORE] Idle connection should be kept when closeIdleConnection is disabled

2023-05-17 Thread via GitHub
warrenzhu25 commented on code in PR #41071: URL: https://github.com/apache/spark/pull/41071#discussion_r1197093878 ## common/network-common/src/main/java/org/apache/spark/network/server/TransportChannelHandler.java: ## @@ -163,14 +163,11 @@ public void

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197147035 ## core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala: ## @@ -1618,6 +1618,24 @@ class SparkSubmitSuite conf.get(k) should be (v) } } +

[GitHub] [spark] turboFei commented on a diff in pull request #41201: [SPARK-43540][K8S][CORE] Add working directory into classpath on the driver in K8S cluster mode

2023-05-17 Thread via GitHub
turboFei commented on code in PR #41201: URL: https://github.com/apache/spark/pull/41201#discussion_r1197146855 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -425,7 +428,7 @@ private[spark] class SparkSubmit extends Logging { case

  1   2   >