date:20230601

[GitHub] [spark] henrymai closed pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub

henrymai closed pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation URL: https://github.com/apache/spark/pull/41413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-06-01 Thread via GitHub

LuciferYang commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1572277808 > * 8: https://github.com/LuciferYang/spark/actions/runs/5141696541 > * 11: https://github.com/LuciferYang/spark/actions/runs/5141698030 > * 17:

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213338142 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +157,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] dongjoon-hyun closed pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns

2023-06-01 Thread via GitHub

dongjoon-hyun closed pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns URL: https://github.com/apache/spark/pull/41422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41065: [SPARK-43384][SQL] Make `df.show` print a nice string for `MapType`.

2023-06-01 Thread via GitHub

dongjoon-hyun commented on code in PR #41065: URL: https://github.com/apache/spark/pull/41065#discussion_r1213442563 ## python/pyspark/ml/feature.py: ## @@ -5313,7 +5313,7 @@ class VectorAssembler( +---+---++-+ | a| b| c| features|

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213318755 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -83,8 +85,10 @@ case class ShuffledHashJoinExec( iter,

[GitHub] [spark] MaxGekk commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub

MaxGekk commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213538908 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] amaliujia opened a new pull request, #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils directly

2023-06-01 Thread via GitHub

amaliujia opened a new pull request, #41427: URL: https://github.com/apache/spark/pull/41427 ### What changes were proposed in this pull request? Spark Connect client should depend on common-utils directly. ### Why are the changes needed? Spark Connect client is

[GitHub] [spark] bersprockets commented on pull request #41411: [SPARK-43910][SQL] Strip `__auto_generated_subquery_name` from ids in errors

2023-06-01 Thread via GitHub

bersprockets commented on PR #41411: URL: https://github.com/apache/spark/pull/41411#issuecomment-1572257793 Belated lgtm from me too. Thanks for this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] huaxingao commented on pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

huaxingao commented on PR #41398: URL: https://github.com/apache/spark/pull/41398#issuecomment-1572310899 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun closed pull request #41406: [SPARK-43899][BUILD] Upgrade protobuf-java to 3.23.2

2023-06-01 Thread via GitHub

dongjoon-hyun closed pull request #41406: [SPARK-43899][BUILD] Upgrade protobuf-java to 3.23.2 URL: https://github.com/apache/spark/pull/41406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] amaliujia commented on pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub

amaliujia commented on PR #41426: URL: https://github.com/apache/spark/pull/41426#issuecomment-1572587179 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia opened a new pull request, #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub

amaliujia opened a new pull request, #41426: URL: https://github.com/apache/spark/pull/41426 ### What changes were proposed in this pull request? We need a sql/api module to host public API like DataType, Row, etc. This module can be shared between Catalyst and Spark Connect

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213566557 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r121365 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -57,6 +57,8 @@ case class ShuffledHashJoinExec( override def

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213334184 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -507,8 +507,6 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213356047 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -622,28 +632,23 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] anishshri-db closed pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub

anishshri-db closed pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider URL: https://github.com/apache/spark/pull/41410 -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] anishshri-db commented on pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub

anishshri-db commented on PR #41410: URL: https://github.com/apache/spark/pull/41410#issuecomment-1572473979 Checked the results here and basically it seems like with high overwrite rate, the perf actually becomes worse ```

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213559035 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213340830 ## sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala: ## @@ -622,28 +632,23 @@ class JoinHintSuite extends PlanTest with SharedSparkSession with

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213341195 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +157,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] dongjoon-hyun commented on pull request #41402: [SPARK-43898][CORE] Automatically register `immutable.ArraySeq$ofRef` to `KryoSerializer` for Scala 2.13

2023-06-01 Thread via GitHub

dongjoon-hyun commented on PR #41402: URL: https://github.com/apache/spark/pull/41402#issuecomment-1572333245 Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #41422: [SPARK-43541][SQL][3.2] Propagate all `Project` tags in resolving of expressions and missing columns

2023-06-01 Thread via GitHub

dongjoon-hyun commented on PR #41422: URL: https://github.com/apache/spark/pull/41422#issuecomment-1572331320 Hi, @MaxGekk . Sorry but Apache Spark 3.2 is EOL according to our versioning policy. - https://spark.apache.org/versioning-policy.html > No more ... releases should be

[GitHub] [spark] peter-toth commented on a diff in pull request #40744: [SPARK-24497][SQL] Support recursive SQL

2023-06-01 Thread via GitHub

peter-toth commented on code in PR #40744: URL: https://github.com/apache/spark/pull/40744#discussion_r1213420095 ## sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala: ## @@ -714,6 +717,121 @@ case class UnionExec(children: Seq[SparkPlan])

[GitHub] [spark] amaliujia commented on pull request #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils directly

2023-06-01 Thread via GitHub

amaliujia commented on PR #41427: URL: https://github.com/apache/spark/pull/41427#issuecomment-1572592407 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41426: URL: https://github.com/apache/spark/pull/41426#discussion_r1213540263 ## sql/api/pom.xml: ## @@ -0,0 +1,45 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; +

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213320729 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -145,31 +154,38 @@ case class ShuffledHashJoinExec( } /** -

[GitHub] [spark] BeishaoCao-db commented on pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub

BeishaoCao-db commented on PR #41396: URL: https://github.com/apache/spark/pull/41396#issuecomment-1572359214 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-meng commented on pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub

xinrong-meng commented on PR #41321: URL: https://github.com/apache/spark/pull/41321#issuecomment-1572492288 @ueshin @HyukjinKwon @zhengruifeng may I ask for your review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] szehon-ho commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

szehon-ho commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213347419 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -219,14 +238,15 @@ case class ShuffledHashJoinExec( * the value

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213484264 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] amaliujia opened a new pull request, #41425: [SPARK-43919] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

amaliujia opened a new pull request, #41425: URL: https://github.com/apache/spark/pull/41425 ### What changes were proposed in this pull request? Extract JSON functionality out of Row. Row is public API that is needed by Spark Connect client. We are planning to move Row to a

[GitHub] [spark] amaliujia commented on pull request #41425: [SPARK-43919] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

amaliujia commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1572556520 @hvanhovell @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213511276 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/SparkConnectPlanExecution.scala: ## @@ -0,0 +1,237 @@ +/* + * Licensed to the

[GitHub] [spark] siying commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

siying commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213545399 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala: ## @@ -80,6 +80,8 @@ object SchemaConverters { case DOUBLE =>

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213558400 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] asl3 commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub

asl3 commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213395143 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -899,6 +899,29 @@ class QueryExecutionErrorsSuite ) } } + +

[GitHub] [spark] ueshin commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub

ueshin commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213529660 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by Python

[GitHub] [spark] amaliujia commented on a diff in pull request #41426: [SPARK-43920][SQL][CONNECT] Create sql/api module

2023-06-01 Thread via GitHub

amaliujia commented on code in PR #41426: URL: https://github.com/apache/spark/pull/41426#discussion_r1213542513 ## sql/api/pom.xml: ## @@ -0,0 +1,45 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; +

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213562310 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213345816 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -219,14 +238,15 @@ case class ShuffledHashJoinExec( * the value

[GitHub] [spark] huaxingao commented on a diff in pull request #41398: [SPARK-36612][SQL] Support left outer join build left or right outer join build right in shuffled hash join

2023-06-01 Thread via GitHub

huaxingao commented on code in PR #41398: URL: https://github.com/apache/spark/pull/41398#discussion_r1213344488 ## sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala: ## @@ -98,16 +105,21 @@ case class ShuffledHashJoinExec(

[GitHub] [spark] rangadi commented on a diff in pull request #41318: [SPARK-43803] [SS] [CONNECT] Improve awaitTermination() to handle client disconnects

2023-06-01 Thread via GitHub

rangadi commented on code in PR #41318: URL: https://github.com/apache/spark/pull/41318#discussion_r1213488092 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -2597,6 +2600,50 @@ class SparkConnectPlanner(val

[GitHub] [spark] hvanhovell commented on a diff in pull request #41315: [SPARK-43755][CONNECT] Move execution out of SparkExecutePlanStreamHandler and to a different thread

2023-06-01 Thread via GitHub

hvanhovell commented on code in PR #41315: URL: https://github.com/apache/spark/pull/41315#discussion_r1213565180 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/ExecutionHolder.scala: ## @@ -0,0 +1,196 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] allisonwang-db commented on a diff in pull request #41316: [SPARK-43798][SQL][PYTHON] Support Python user-defined table functions

2023-06-01 Thread via GitHub

allisonwang-db commented on code in PR #41316: URL: https://github.com/apache/spark/pull/41316#discussion_r1213749586 ## python/pyspark/worker.py: ## @@ -456,6 +456,54 @@ def assign_cols_by_name(runner_conf): ) +def read_udtf(pickleSer, infile, eval_type): +

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub

xinrong-meng commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213766572 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by

[GitHub] [spark] HyukjinKwon commented on pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub

HyukjinKwon commented on PR #41396: URL: https://github.com/apache/spark/pull/41396#issuecomment-1572944131 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub

HyukjinKwon commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1213789728 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2355,4 +2355,11 @@ package object config { .version("3.3.0") .intConf

[GitHub] [spark] HyukjinKwon closed pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame`

2023-06-01 Thread via GitHub

HyukjinKwon closed pull request #41396: [SPARK-43892][PYTHON] Add autocomplete support for `df[|]` in `pyspark.sql.dataframe.DataFrame` URL: https://github.com/apache/spark/pull/41396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] brkyvz commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub

brkyvz commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213796477 ## connector/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala: ## @@ -117,178 +119,260 @@ private[sql] class AvroDeserializer( val

[GitHub] [spark] sadikovi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

sadikovi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213813283 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

cloud-fan commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213838714 ## sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala: ## @@ -526,95 +513,4 @@ trait Row extends Serializable { private def getAnyValAs[T <: AnyVal](i:

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for toSQLId

2023-06-01 Thread via GitHub

panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573049529 https://github.com/apache/spark/assets/15246973/008df8cc-42d0-49a0-bf3e-697feaaffb0c;> https://github.com/apache/spark/assets/15246973/f3d04092-38ef-4357-964f-0bd4d082317e;> --

[GitHub] [spark] LuciferYang commented on pull request #41427: [SPARK-43888][FOLLOW-UP] Spark Connect client should depend on common-utils explicitly

2023-06-01 Thread via GitHub

LuciferYang commented on PR #41427: URL: https://github.com/apache/spark/pull/41427#issuecomment-1573052351 I think `common-utils` should be shaded into the `connect-client-jvm` module uber jar, so that users can still just use `connect-client-jvm` module -- This is an automated

[GitHub] [spark] degant commented on pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub

degant commented on PR #41428: URL: https://github.com/apache/spark/pull/41428#issuecomment-1572919477 Thanks for the reply @dongjoon-hyun. Unfortunately we can't move to Spark 3.4 yet, and need to continue using Spark 3.3. The vulnerability has a score of 9.9 which is why I was requesting

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub

zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213739673 ## core/src/main/resources/error/error-classes.json: ## @@ -64,6 +64,16 @@ } } }, + "AVRO_INCORRECT_TYPE" : { +"message" : [ + "Cannot

[GitHub] [spark] zeruibao commented on a diff in pull request #41052: [SPARK-43380][SQL] Fix Avro data type conversion issues to avoid producing incorrect results

2023-06-01 Thread via GitHub

zeruibao commented on code in PR #41052: URL: https://github.com/apache/spark/pull/41052#discussion_r1213739528 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4209,6 +4209,18 @@ object SQLConf { .booleanConf

[GitHub] [spark] learningchess2003 opened a new pull request, #41429: [SPARK-43922] Add named parameter support in parser for function calls

2023-06-01 Thread via GitHub

learningchess2003 opened a new pull request, #41429: URL: https://github.com/apache/spark/pull/41429 ### What changes were proposed in this pull request? We plan on adding two new tokens called ```namedArgumentExpression``` and ```functionArgument``` which would enable this feature. When

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213811989 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213812202 ## connector/avro/src/test/scala/org/apache/spark/sql/avro/AvroLogicalTypeSuite.scala: ## @@ -446,6 +446,98 @@ abstract class AvroLogicalTypeSuite extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

dongjoon-hyun commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1572979757 Also, cc @gengliangwang , too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub

panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1572979133 cc @HyukjinKwon @dongjoon-hyun @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] yaooqinn commented on pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-06-01 Thread via GitHub

yaooqinn commented on PR #40953: URL: https://github.com/apache/spark/pull/40953#issuecomment-1573028003 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn closed pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array

2023-06-01 Thread via GitHub

yaooqinn closed pull request #40953: [SPARK-43267][JDBC] Handle postgres unknown user-defined column as string in array URL: https://github.com/apache/spark/pull/40953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213859516 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213861276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] rangadi commented on a diff in pull request #41377: [DRAFT] Generate Protobuf descriptor files at build time.

2023-06-01 Thread via GitHub

rangadi commented on code in PR #41377: URL: https://github.com/apache/spark/pull/41377#discussion_r1213687793 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -40,11 +40,11 @@ class ProtobufFunctionsSuite extends QueryTest

[GitHub] [spark] rangadi commented on pull request #41377: [DRAFT] Generate Protobuf descriptor files at build time.

2023-06-01 Thread via GitHub

rangadi commented on PR #41377: URL: https://github.com/apache/spark/pull/41377#issuecomment-1572792964 This is now ready for review. cc: @LuciferYang, @gengliangwang, @SandishKumarHN -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] jwang0306 commented on pull request #41302: [SPARK-43782][CORE] Support log level configuration with static Spark conf

2023-06-01 Thread via GitHub

jwang0306 commented on PR #41302: URL: https://github.com/apache/spark/pull/41302#issuecomment-1572973821 > And, welcome to the Apache Spark community, @jwang0306 . I added you to the Apache Spark contributor group and assigned

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for toSQLId

2023-06-01 Thread via GitHub

panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573042769 Let me fix bug for 'toSQLId' -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-docker] Yikun closed pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub

Yikun closed pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry URL: https://github.com/apache/spark-docker/pull/45 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark-docker] Yikun commented on pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub

Yikun commented on PR #45: URL: https://github.com/apache/spark-docker/pull/45#issuecomment-1573048404 Merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #41425: [SPARK-43919][SQL] Extract JSON functionality out of Row

2023-06-01 Thread via GitHub

LuciferYang commented on code in PR #41425: URL: https://github.com/apache/spark/pull/41425#discussion_r1213861276 ## sql/catalyst/src/main/scala/org/apache/spark/sql/util/ToJsonUtil.scala: ## @@ -0,0 +1,130 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] HeartSaVioR commented on pull request #41410: [SPARK-43902][SS] Use keyMayExist to check if key is absent and avoid gets while tracking metrics using RocksDB state store provider

2023-06-01 Thread via GitHub

HeartSaVioR commented on PR #41410: URL: https://github.com/apache/spark/pull/41410#issuecomment-1572825624 Thanks for the check and update! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] sadikovi commented on pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

sadikovi commented on PR #41409: URL: https://github.com/apache/spark/pull/41409#issuecomment-1572908647 @dongjoon-hyun Could you also review this PR? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sadikovi commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

sadikovi commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213766622 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-01 Thread via GitHub

cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1213789191 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -1289,6 +1291,12 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #41415: [SPARK-43906][PYTHON][CONNECT] Implement the file support in SparkSession.addArtifacts

2023-06-01 Thread via GitHub

HyukjinKwon commented on code in PR #41415: URL: https://github.com/apache/spark/pull/41415#discussion_r1213789019 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/artifact/SparkConnectArtifactManager.scala: ## @@ -154,6 +154,8 @@ class

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub

panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1573004161 @MaxGekk https://github.com/apache/spark/assets/15246973/eccb88ee-27be-428a-9c71-63e0edafb910;> There seems to be a problem with this one -- This is an automated message from

[GitHub] [spark] Yikf commented on a diff in pull request #41065: [SPARK-43384][SQL] Make `df.show` print a nice string for `MapType`.

2023-06-01 Thread via GitHub

Yikf commented on code in PR #41065: URL: https://github.com/apache/spark/pull/41065#discussion_r1213851366 ## python/pyspark/ml/feature.py: ## @@ -5313,7 +5313,7 @@ class VectorAssembler( +---+---++-+ | a| b| c| features|

[GitHub] [spark] henrymai commented on pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub

henrymai commented on PR #41413: URL: https://github.com/apache/spark/pull/41413#issuecomment-1572764768 The "Build" workflow mostly succeeded: https://github.com/henrymai/spark/actions/runs/5145437129/jobs/9263095496 There are failures in "sql - slow tests" and "sql - other tests"

[GitHub] [spark] henrymai commented on pull request #41413: [SPARK-43905][CORE] Consolidate BlockId parsing and creation

2023-06-01 Thread via GitHub

henrymai commented on PR #41413: URL: https://github.com/apache/spark/pull/41413#issuecomment-1572765713 @gatorsmile This is ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #41387: [SPARK-42299] Assign name to _LEGACY_ERROR_TEMP_2206

2023-06-01 Thread via GitHub

HeartSaVioR commented on code in PR #41387: URL: https://github.com/apache/spark/pull/41387#discussion_r1213736724 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala: ## @@ -2011,7 +2011,7 @@ private[sql] object QueryExecutionErrors extends

[GitHub] [spark] dongjoon-hyun commented on pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub

dongjoon-hyun commented on PR #41428: URL: https://github.com/apache/spark/pull/41428#issuecomment-1572921243 Let me ping the original members once more here. (cc @Ngone51 , @mridulm , @HyukjinKwon ) -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] WweiL commented on a diff in pull request #41129: [SPARK-43133] Scala Client DataStreamWriter Foreach support

2023-06-01 Thread via GitHub

WweiL commented on code in PR #41129: URL: https://github.com/apache/spark/pull/41129#discussion_r1213787122 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala: ## @@ -163,13 +165,106 @@ class StreamingQuerySuite extends

[GitHub] [spark] github-actions[bot] commented on pull request #40077: [SPIP][POC] Driver scaling: parallel schedulers

2023-06-01 Thread via GitHub

github-actions[bot] commented on PR #40077: URL: https://github.com/apache/spark/pull/40077#issuecomment-1572952907 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #40079: [SPARK-42486][BUILD] Upgrade `ZooKeeper` to 3.6.4

2023-06-01 Thread via GitHub

github-actions[bot] commented on PR #40079: URL: https://github.com/apache/spark/pull/40079#issuecomment-1572952888 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #39967: [SPARK-42395][K8S]The code logic of the configmap max size validation lacks extra content

2023-06-01 Thread via GitHub

github-actions[bot] closed pull request #39967: [SPARK-42395][K8S]The code logic of the configmap max size validation lacks extra content URL: https://github.com/apache/spark/pull/39967 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] degant commented on a diff in pull request #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub

degant commented on code in PR #41428: URL: https://github.com/apache/spark/pull/41428#discussion_r1213809660 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2355,4 +2355,11 @@ package object config { .version("3.3.0") .intConf

[GitHub] [spark] panbingkun opened a new pull request, #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub

panbingkun opened a new pull request, #41430: URL: https://github.com/apache/spark/pull/41430 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark-docker] Yikun commented on pull request #45: [SPARK-43368] Use `libnss_wrapper` to fake passwd entry

2023-06-01 Thread via GitHub

Yikun commented on PR #45: URL: https://github.com/apache/spark-docker/pull/45#issuecomment-1573047801 @HyukjinKwon @pan3793 Thanks, I will merge this soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] gengliangwang commented on a diff in pull request #41395: [SPARK-43884] Param markers in DDL

2023-06-01 Thread via GitHub

gengliangwang commented on code in PR #41395: URL: https://github.com/apache/spark/pull/41395#discussion_r1213874614 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ## @@ -488,6 +488,8 @@ class SparkSqlAstBuilder extends AstBuilder { } else

[GitHub] [spark] degant opened a new pull request, #41428: [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

2023-06-01 Thread via GitHub

degant opened a new pull request, #41428: URL: https://github.com/apache/spark/pull/41428 Backporting fix for SPARK-41958 to 3.3 branch from #39474 Below description from original PR. -- ### What changes were proposed in this pull request? This

[GitHub] [spark] xinrong-meng commented on a diff in pull request #41321: [SPARK-43893][PYTHON][CONNECT] Non-atomic data type support in Arrow-optimized Python UDF

2023-06-01 Thread via GitHub

xinrong-meng commented on code in PR #41321: URL: https://github.com/apache/spark/pull/41321#discussion_r1213770380 ## python/pyspark/sql/pandas/serializers.py: ## @@ -298,26 +299,39 @@ class ArrowStreamPandasUDFSerializer(ArrowStreamPandasSerializer): Serializer used by

[GitHub] [spark] cloud-fan commented on a diff in pull request #40908: [SPARK-42750][SQL] Support Insert By Name statement

2023-06-01 Thread via GitHub

cloud-fan commented on code in PR #40908: URL: https://github.com/apache/spark/pull/40908#discussion_r1213790789 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -318,7 +318,7 @@ query insertInto : INSERT OVERWRITE TABLE?

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #41409: [SPARK-43901][SQL] Avro to Support custom decimal type backed by Long

2023-06-01 Thread via GitHub

dongjoon-hyun commented on code in PR #41409: URL: https://github.com/apache/spark/pull/41409#discussion_r1213805784 ## connector/avro/src/main/java/org/apache/spark/sql/avro/CustomDecimal.scala: ## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] panbingkun commented on pull request #41430: [MINOR][TESTS] fix bug for natural-join

2023-06-01 Thread via GitHub

panbingkun commented on PR #41430: URL: https://github.com/apache/spark/pull/41430#issuecomment-1572976972 https://github.com/apache/spark/assets/15246973/7f51f45d-94fc-481e-a818-bacd03135d5a;> -- This is an automated message from the Apache Git Service. To respond to the message,

1 2 >

1 - 100 of 184 matches

Mail list logo