[GitHub] [spark] henrymai closed pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub
henrymai closed pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation URL: https://github.com/apache/spark/pull/41413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-06-01 Thread via GitHub
LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1572277808 > * 8: https://github.com/LuciferYang/spark/actions/runs/5141696541 > * 11: https://github.com/LuciferYang/spark/actions/runs/5141698030 > * 17:

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213338142 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +157,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] dongjoon-hyun closed pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns

2023-06-01 Thread via GitHub
dongjoon-hyun closed pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns URL: https://github.com/apache/spark/pull/41422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41065: [SPARK-43384][SQL] Make `df.show` print a nice string for `MapType`.

2023-06-01 Thread via GitHub
dongjoon-hyun commented on code in PR #41065: URL: https://github.com/apache/spark/pull/41065#discussion_r1213442563 ## python/pyspark/ml/feature.py: ## @@ -5313,7 +5313,7 @@ class VectorAssembler( +---+---++-+ | a| b| c| features|

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213318755 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -83,8 +85,10 @@ case class ShuffledHashJoinExec( iter,

[GitHub] [spark] MaxGekk commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub
MaxGekk commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213538908 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] amaliujia opened a new pull request, #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils directly

2023-06-01 Thread via GitHub
amaliujia opened a new pull request, #41427: URL: https://github.com/apache/spark/pull/41427 ### What changes were proposed in this pull request? Spark Connect client should depend on common-utils directly. ### Why are the changes needed? Spark Connect client is

[GitHub] [spark] bersprockets commented on pull request #41411: [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors

2023-06-01 Thread via GitHub
bersprockets commented on PR #41411: URL: https://github.com/apache/spark/pull/41411#issuecomment-1572257793 Belated lgtm from me too. Thanks for this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
huaxingao commented on PR #41398: URL: https://github.com/apache/spark/pull/41398#issuecomment-1572310899 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun closed pull request #41406: [SPARK-43899][BUILD] Upgrade protobuf-java to 3.23.2

2023-06-01 Thread via GitHub
dongjoon-hyun closed pull request #41406: [SPARK-43899][BUILD] Upgrade protobuf-java to 3.23.2 URL: https://github.com/apache/spark/pull/41406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub
amaliujia commented on PR #41426: URL: https://github.com/apache/spark/pull/41426#issuecomment-1572587179 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub
amaliujia opened a new pull request, #41426: URL: https://github.com/apache/spark/pull/41426 ### What changes were proposed in this pull request? We need a sql/api module to host public API like DataType, Row, etc. This module can be shared between Catalyst and Spark Connect

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213566557 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r121365 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -57,6 +57,8 @@ case class ShuffledHashJoinExec( override def

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213334184 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -507,8 +507,6 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213356047 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -622,28 +632,23 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] anishshri-db closed pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub
anishshri-db closed pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider URL: https://github.com/apache/spark/pull/41410 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] anishshri-db commented on pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub
anishshri-db commented on PR #41410: URL: https://github.com/apache/spark/pull/41410#issuecomment-1572473979 Checked the results here and basically it seems like with high overwrite rate, the perf actually becomes worse ```

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213559035 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213340830 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -622,28 +632,23 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213341195 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +157,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] dongjoon-hyun commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-06-01 Thread via GitHub
dongjoon-hyun commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1572333245 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns

2023-06-01 Thread via GitHub
dongjoon-hyun commented on PR #41422: URL: https://github.com/apache/spark/pull/41422#issuecomment-1572331320 Hi, @MaxGekk . Sorry but Apache Spark 3.2 is EOL according to our versioning policy. - https://spark.apache.org/versioning-policy.html > No more ... releases should be

[GitHub] [spark] peter-toth commented on a diff in pull request #40744: [SPARK-24497][SQL] Support recursive SQL

2023-06-01 Thread via GitHub
peter-toth commented on code in PR #40744: URL: https://github.com/apache/spark/pull/40744#discussion_r1213420095 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,121 @@ case class UnionExec(children: Seq[SparkPlan])

[GitHub] [spark] amaliujia commented on pull request #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils directly

2023-06-01 Thread via GitHub
amaliujia commented on PR #41427: URL: https://github.com/apache/spark/pull/41427#issuecomment-1572592407 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41426: URL: https://github.com/apache/spark/pull/41426#discussion_r1213540263 ## sql/api/pom.xml: ## @@ -0,0 +1,45 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; +

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213320729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +154,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] BeishaoCao-db commented on pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub
BeishaoCao-db commented on PR #41396: URL: https://github.com/apache/spark/pull/41396#issuecomment-1572359214 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-meng commented on pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub
xinrong-meng commented on PR #41321: URL: https://github.com/apache/spark/pull/41321#issuecomment-1572492288 @ueshin @HyukjinKwon @zhengruifeng may I ask for your review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213347419 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -219,14 +238,15 @@ case class ShuffledHashJoinExec( * the value

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213484264 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia opened a new pull request, #41425: [SPARK-43919] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
amaliujia opened a new pull request, #41425: URL: https://github.com/apache/spark/pull/41425 ### What changes were proposed in this pull request? Extract JSON functionality out of Row. Row is public API that is needed by Spark Connect client. We are planning to move Row to a

[GitHub] [spark] amaliujia commented on pull request #41425: [SPARK-43919] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
amaliujia commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1572556520 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213511276 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the

[GitHub] [spark] siying commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
siying commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213545399 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala: ## @@ -80,6 +80,8 @@ object SchemaConverters { case DOUBLE =>

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213558400 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] asl3 commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub
asl3 commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213395143 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -899,6 +899,29 @@ class QueryExecutionErrorsSuite ) } } + +

[GitHub] [spark] ueshin commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub
ueshin commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213529660 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python

[GitHub] [spark] amaliujia commented on a diff in pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub
amaliujia commented on code in PR #41426: URL: https://github.com/apache/spark/pull/41426#discussion_r1213542513 ## sql/api/pom.xml: ## @@ -0,0 +1,45 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; +

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213562310 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213345816 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -219,14 +238,15 @@ case class ShuffledHashJoinExec( * the value

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub
huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213344488 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -98,16 +105,21 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] rangadi commented on a diff in pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-01 Thread via GitHub
rangadi commented on code in PR #41318: URL: https://github.com/apache/spark/pull/41318#discussion_r1213488092 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2597,6 +2600,50 @@ class SparkConnectPlanner(val

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub
hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213565180 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-01 Thread via GitHub
allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1213749586 ## python/pyspark/worker.py: ## @@ -456,6 +456,54 @@ def assign_cols_by_name(runner_conf): ) +def read_udtf(pickleSer, infile, eval_type): +

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub
xinrong-meng commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213766572 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by

[GitHub] [spark] HyukjinKwon commented on pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub
HyukjinKwon commented on PR #41396: URL: https://github.com/apache/spark/pull/41396#issuecomment-1572944131 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub
HyukjinKwon commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1213789728 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2355,4 +2355,11 @@ package object config { .version("3.3.0") .intConf

[GitHub] [spark] HyukjinKwon closed pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub
HyukjinKwon closed pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame` URL: https://github.com/apache/spark/pull/41396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] brkyvz commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub
brkyvz commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213796477 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala: ## @@ -117,178 +119,260 @@ private[sql] class AvroDeserializer( val

[GitHub] [spark] sadikovi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
sadikovi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213813283 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
cloud-fan commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213838714 ## sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala: ## @@ -526,95 +513,4 @@ trait Row extends Serializable { private def getAnyValAs[T <: AnyVal](i:

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for toSQLId

2023-06-01 Thread via GitHub
panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573049529 https://github.com/apache/spark/assets/15246973/008df8cc-42d0-49a0-bf3e-697feaaffb0c;> https://github.com/apache/spark/assets/15246973/f3d04092-38ef-4357-964f-0bd4d082317e;> --

[GitHub] [spark] LuciferYang commented on pull request #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils explicitly

2023-06-01 Thread via GitHub
LuciferYang commented on PR #41427: URL: https://github.com/apache/spark/pull/41427#issuecomment-1573052351 I think `common-utils` should be shaded into the `connect-client-jvm` module uber jar, so that users can still just use `connect-client-jvm` module -- This is an automated

[GitHub] [spark] degant commented on pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub
degant commented on PR #41428: URL: https://github.com/apache/spark/pull/41428#issuecomment-1572919477 Thanks for the reply @dongjoon-hyun. Unfortunately we can't move to Spark 3.4 yet, and need to continue using Spark 3.3. The vulnerability has a score of 9.9 which is why I was requesting

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub
zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213739673 ## core/src/main/resources/error/error-classes.json: ## @@ -64,6 +64,16 @@ } } }, + "AVRO_INCORRECT_TYPE" : { +"message" : [ + "Cannot

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub
zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213739528 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4209,6 +4209,18 @@ object SQLConf { .booleanConf

[GitHub] [spark] learningchess2003 opened a new pull request, #41429: [SPARK-43922] Add named parameter support in parser for function calls

2023-06-01 Thread via GitHub
learningchess2003 opened a new pull request, #41429: URL: https://github.com/apache/spark/pull/41429 ### What changes were proposed in this pull request? We plan on adding two new tokens called ```namedArgumentExpression``` and ```functionArgument``` which would enable this feature. When

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213811989 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213812202 ## connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeSuite.scala: ## @@ -446,6 +446,98 @@ abstract class AvroLogicalTypeSuite extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
dongjoon-hyun commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1572979757 Also, cc @gengliangwang , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub
panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1572979133 cc @HyukjinKwon @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] yaooqinn commented on pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-06-01 Thread via GitHub
yaooqinn commented on PR #40953: URL: https://github.com/apache/spark/pull/40953#issuecomment-1573028003 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn closed pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-06-01 Thread via GitHub
yaooqinn closed pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array URL: https://github.com/apache/spark/pull/40953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213861276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] rangadi commented on a diff in pull request #41377: [DRAFT] Generate Protobuf descriptor files at build time.

2023-06-01 Thread via GitHub
rangadi commented on code in PR #41377: URL: https://github.com/apache/spark/pull/41377#discussion_r1213687793 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -40,11 +40,11 @@ class ProtobufFunctionsSuite extends QueryTest

[GitHub] [spark] rangadi commented on pull request #41377: [DRAFT] Generate Protobuf descriptor files at build time.

2023-06-01 Thread via GitHub
rangadi commented on PR #41377: URL: https://github.com/apache/spark/pull/41377#issuecomment-1572792964 This is now ready for review. cc: @LuciferYang, @gengliangwang, @SandishKumarHN -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] jwang0306 commented on pull request #41302: [SPARK-43782][CORE] Support log level configuration with static Spark conf

2023-06-01 Thread via GitHub
jwang0306 commented on PR #41302: URL: https://github.com/apache/spark/pull/41302#issuecomment-1572973821 > And, welcome to the Apache Spark community, @jwang0306 . I added you to the Apache Spark contributor group and assigned

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for toSQLId

2023-06-01 Thread via GitHub
panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573042769 Let me fix bug for 'toSQLId' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-docker] Yikun closed pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub
Yikun closed pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry URL: https://github.com/apache/spark-docker/pull/45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-docker] Yikun commented on pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub
Yikun commented on PR #45: URL: https://github.com/apache/spark-docker/pull/45#issuecomment-1573048404 Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub
LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213861276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] HeartSaVioR commented on pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub
HeartSaVioR commented on PR #41410: URL: https://github.com/apache/spark/pull/41410#issuecomment-1572825624 Thanks for the check and update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sadikovi commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
sadikovi commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1572908647 @dongjoon-hyun Could you also review this PR? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sadikovi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
sadikovi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213766622 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-01 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1213789191 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1289,6 +1291,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-01 Thread via GitHub
HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1213789019 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub
panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573004161 @MaxGekk https://github.com/apache/spark/assets/15246973/eccb88ee-27be-428a-9c71-63e0edafb910;> There seems to be a problem with this one -- This is an automated message from

[GitHub] [spark] Yikf commented on a diff in pull request #41065: [SPARK-43384][SQL] Make `df.show` print a nice string for `MapType`.

2023-06-01 Thread via GitHub
Yikf commented on code in PR #41065: URL: https://github.com/apache/spark/pull/41065#discussion_r1213851366 ## python/pyspark/ml/feature.py: ## @@ -5313,7 +5313,7 @@ class VectorAssembler( +---+---++-+ | a| b| c| features|

[GitHub] [spark] henrymai commented on pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub
henrymai commented on PR #41413: URL: https://github.com/apache/spark/pull/41413#issuecomment-1572764768 The "Build" workflow mostly succeeded: https://github.com/henrymai/spark/actions/runs/5145437129/jobs/9263095496 There are failures in "sql - slow tests" and "sql - other tests"

[GitHub] [spark] henrymai commented on pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub
henrymai commented on PR #41413: URL: https://github.com/apache/spark/pull/41413#issuecomment-1572765713 @gatorsmile This is ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub
HeartSaVioR commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213736724 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub
dongjoon-hyun commented on PR #41428: URL: https://github.com/apache/spark/pull/41428#issuecomment-1572921243 Let me ping the original members once more here. (cc @Ngone51 , @mridulm , @HyukjinKwon ) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] WweiL commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-06-01 Thread via GitHub
WweiL commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1213787122 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala: ## @@ -163,13 +165,106 @@ class StreamingQuerySuite extends

[GitHub] [spark] github-actions[bot] commented on pull request #40077: [SPIP][POC] Driver scaling: parallel schedulers

2023-06-01 Thread via GitHub
github-actions[bot] commented on PR #40077: URL: https://github.com/apache/spark/pull/40077#issuecomment-1572952907 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #40079: [SPARK-42486][BUILD] Upgrade `ZooKeeper` to 3.6.4

2023-06-01 Thread via GitHub
github-actions[bot] commented on PR #40079: URL: https://github.com/apache/spark/pull/40079#issuecomment-1572952888 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39967: [SPARK-42395][K8S]The code logic of the configmap max size validation lacks extra content

2023-06-01 Thread via GitHub
github-actions[bot] closed pull request #39967: [SPARK-42395][K8S]The code logic of the configmap max size validation lacks extra content URL: https://github.com/apache/spark/pull/39967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] degant commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub
degant commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1213809660 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2355,4 +2355,11 @@ package object config { .version("3.3.0") .intConf

[GitHub] [spark] panbingkun opened a new pull request, #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub
panbingkun opened a new pull request, #41430: URL: https://github.com/apache/spark/pull/41430 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark-docker] Yikun commented on pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub
Yikun commented on PR #45: URL: https://github.com/apache/spark-docker/pull/45#issuecomment-1573047801 @HyukjinKwon @pan3793 Thanks, I will merge this soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang commented on a diff in pull request #41395: [SPARK-43884] Param markers in DDL

2023-06-01 Thread via GitHub
gengliangwang commented on code in PR #41395: URL: https://github.com/apache/spark/pull/41395#discussion_r1213874614 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ## @@ -488,6 +488,8 @@ class SparkSqlAstBuilder extends AstBuilder { } else

[GitHub] [spark] degant opened a new pull request, #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub
degant opened a new pull request, #41428: URL: https://github.com/apache/spark/pull/41428 Backporting fix for SPARK-41958 to 3.3 branch from #39474 Below description from original PR. -- ### What changes were proposed in this pull request? This

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub
xinrong-meng commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213770380 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-01 Thread via GitHub
cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1213790789 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -318,7 +318,7 @@ query insertInto : INSERT OVERWRITE TABLE?

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub
dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213805784 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub
panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1572976972 https://github.com/apache/spark/assets/15246973/7f51f45d-94fc-481e-a818-bacd03135d5a;> -- This is an automated message from the Apache Git Service. To respond to the message,

  1   2   >