[GitHub] [spark] dongjoon-hyun opened a new pull request, #40434: [SPARK-42801][CONNECT][TESTS] Ignore flaky test in Java 8

2023-03-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #40434: URL: https://github.com/apache/spark/pull/40434 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] dongjoon-hyun commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469380025 Thank you, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] Hisoka-X commented on a diff in pull request #40326: [SPARK-42708][DOCS] Improve doc about protobuf java file can't be indexed.

2023-03-14 Thread via GitHub
Hisoka-X commented on code in PR #40326: URL: https://github.com/apache/spark/pull/40326#discussion_r1136560365 ## connector/protobuf/README.md: ## @@ -34,3 +34,17 @@ export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe The user-defined `protoc` binary files can be produced in

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136544713 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136542981 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] cloud-fan commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136540926 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] itholic commented on pull request #40433: [SPARK-42706][SQL][DOCS][3.4] Document the Spark SQL error classes in user-facing documentation

2023-03-14 Thread via GitHub
itholic commented on PR #40433: URL: https://github.com/apache/spark/pull/40433#issuecomment-1469347974 cc @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136540170 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLHandler.scala: ## @@ -0,0 +1,231 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] itholic commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-14 Thread via GitHub
itholic commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1469347578 3.4: https://github.com/apache/spark/pull/40433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] itholic opened a new pull request, #40433: [SPARK-42706][SQL][DOCS][3.4] Document the Spark SQL error classes in user-facing documentation

2023-03-14 Thread via GitHub
itholic opened a new pull request, #40433: URL: https://github.com/apache/spark/pull/40433 ### What changes were proposed in this pull request? Cherry-pick for https://github.com/apache/spark/pull/40336. This PR proposes to document Spark SQL error classes to [Spark SQL

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136539321 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136535275 ## python/pyspark/sql/connect/ml/base.py: ## @@ -0,0 +1,327 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] harupy commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
harupy commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136535013 ## python/pyspark/sql/connect/ml/base.py: ## @@ -0,0 +1,327 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] [spark] anishshri-db commented on pull request #40427: [SPARK-42792][SS] Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators

2023-03-14 Thread via GitHub
anishshri-db commented on PR #40427: URL: https://github.com/apache/spark/pull/40427#issuecomment-1469335530 @HeartSaVioR - tests look good. Pls merge when you get a chance. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-14 Thread via GitHub
dongjoon-hyun commented on code in PR #40396: URL: https://github.com/apache/spark/pull/40396#discussion_r1136512197 ## docs/sql-migration-guide.md: ## @@ -22,6 +22,10 @@ license: | * Table of contents {:toc} +## Upgrading from Spark SQL 3.4 to 3.5 + +- Since Spark 3.5, the

[GitHub] [spark] zhengruifeng commented on pull request #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-14 Thread via GitHub
zhengruifeng commented on PR #40432: URL: https://github.com/apache/spark/pull/40432#issuecomment-1469295848 cc @WeichenXu123 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] otterc commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-14 Thread via GitHub
otterc commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-1469293356 @Stove-hust Haven't had a chance to look at it yet. I'll take a look at it this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] zhengruifeng opened a new pull request, #40432: [SPARK-42800][CONNECT][PYTHON][ML] Implement ml function `{array_to_vector, vector_to_array}`

2023-03-14 Thread via GitHub
zhengruifeng opened a new pull request, #40432: URL: https://github.com/apache/spark/pull/40432 ### What changes were proposed in this pull request? Implement ml function `{array_to_vector, vector_to_array}` ### Why are the changes needed? function parity ### Does

[GitHub] [spark] dongjoon-hyun commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469279124 Thank you, @srowen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469270112 Thank you, @yaooqinn ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40431: URL: https://github.com/apache/spark/pull/40431#issuecomment-1469265999 cc @bjornjorgensen, @yaooqinn , @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136498864 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] StevenChenDatabricks commented on a diff in pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
StevenChenDatabricks commented on code in PR #40385: URL: https://github.com/apache/spark/pull/40385#discussion_r1136498864 ## sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala: ## @@ -771,6 +775,130 @@ class ExplainSuiteAE extends ExplainSuiteHelper with

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-14 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1136497645 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] HyukjinKwon closed pull request #40388: [SPARK-42765][CONNECT][PYTHON] Enable importing `pandas_udf` from `pyspark.sql.connect.functions`

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40388: [SPARK-42765][CONNECT][PYTHON] Enable importing `pandas_udf` from `pyspark.sql.connect.functions` URL: https://github.com/apache/spark/pull/40388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-14 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1136497645 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] HyukjinKwon commented on pull request #40388: [SPARK-42765][CONNECT][PYTHON] Enable importing `pandas_udf` from `pyspark.sql.connect.functions`

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40388: URL: https://github.com/apache/spark/pull/40388#issuecomment-1469264035 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-14 Thread via GitHub
itholic commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1136497645 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] HyukjinKwon closed pull request #40428: [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect content

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40428: [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect content URL: https://github.com/apache/spark/pull/40428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] dongjoon-hyun opened a new pull request, #40431: [SPARK-42799][BUILD] Update SBT build `xercesImpl` version to match with `pom.xml`

2023-03-14 Thread via GitHub
dongjoon-hyun opened a new pull request, #40431: URL: https://github.com/apache/spark/pull/40431 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] HyukjinKwon commented on pull request #40428: [SPARK-42797][CONNECT][DOCS] Grammatical improvements for Spark Connect content

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40428: URL: https://github.com/apache/spark/pull/40428#issuecomment-1469263601 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] itholic commented on a diff in pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-14 Thread via GitHub
itholic commented on code in PR #40372: URL: https://github.com/apache/spark/pull/40372#discussion_r1136490675 ## python/pyspark/errors/exceptions/captured.py: ## @@ -65,8 +65,15 @@ def __str__(self) -> str: assert SparkContext._jvm is not None jvm =

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136486450 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,55 @@ class PlanGenerationTestSuite

[GitHub] [spark] srowen commented on pull request #18990: [SPARK-21782][Core] Repartition creates skews when numPartitions is a power of 2

2023-03-14 Thread via GitHub
srowen commented on PR #18990: URL: https://github.com/apache/spark/pull/18990#issuecomment-1469244151 I was using 3.3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136482549 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,55 @@ class PlanGenerationTestSuite

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136482549 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,55 @@ class PlanGenerationTestSuite

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136481658 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/PlanGenerationTestSuite.scala: ## @@ -2065,6 +2065,55 @@ class PlanGenerationTestSuite

[GitHub] [spark] LuciferYang opened a new pull request, #40430: [SPARK-42798][BUILD] Upgrade protobuf-java to 3.22.2

2023-03-14 Thread via GitHub
LuciferYang opened a new pull request, #40430: URL: https://github.com/apache/spark/pull/40430 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
cloud-fan commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1469225186 So we randomly pick one `ReusedExchange` to print its corresponding `Exchange`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] Stove-hust commented on pull request #40412: [SPARK-42784] should still create subDir when the number of subDir in merge dir is less than conf

2023-03-14 Thread via GitHub
Stove-hust commented on PR #40412: URL: https://github.com/apache/spark/pull/40412#issuecomment-1469223693 @mridulm Hello, can you recruit someone to help review this pr。I would appreciate for your help -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-14 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1136474892 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlignUpdateAssignmentsSuite.scala: ## @@ -0,0 +1,786 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-14 Thread via GitHub
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1136474666 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager:

[GitHub] [spark] Stove-hust commented on pull request #40393: [SPARK-40082] Schedule mergeFinalize when push merge shuffleMapStage retry but no running tasks

2023-03-14 Thread via GitHub
Stove-hust commented on PR #40393: URL: https://github.com/apache/spark/pull/40393#issuecomment-146938 @otterc Hello, is there anything else I should add? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HeartSaVioR closed pull request #40425: [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread via GitHub
HeartSaVioR closed pull request #40425: [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming URL: https://github.com/apache/spark/pull/40425 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HeartSaVioR commented on pull request #40425: [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread via GitHub
HeartSaVioR commented on PR #40425: URL: https://github.com/apache/spark/pull/40425#issuecomment-1469209415 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang commented on pull request #40422: [MINOR] Use getParameterCount function instead of getParameterTypes.length

2023-03-14 Thread via GitHub
LuciferYang commented on PR #40422: URL: https://github.com/apache/spark/pull/40422#issuecomment-1469206611 I think we should file a jira to record this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] beliefer commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136467297 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136462174 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -106,6 +107,19 @@ object

[GitHub] [spark] StevenChenDatabricks commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
StevenChenDatabricks commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1469194144 @cloud-fan It wouldn't because `collectOperatorsWithID` in ExplainUtils is responsible for collecting the list of nodes to print out. It uses a BitSet `collectedOperators`

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136459542 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala: ## @@ -97,6 +101,87 @@ object

[GitHub] [spark] LuciferYang commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
LuciferYang commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136459267 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -106,6 +107,19 @@ object

[GitHub] [spark] cloud-fan commented on pull request #40336: [SPARK-42706][SQL][DOCS] Document the Spark SQL error classes in user-facing documentation.

2023-03-14 Thread via GitHub
cloud-fan commented on PR #40336: URL: https://github.com/apache/spark/pull/40336#issuecomment-1469189625 It's probably ok to mention non-existing errors in the doc in 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML

2023-03-14 Thread via GitHub
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1136451429 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -82,13 +83,50 @@ message Relation { // Catalog API (experimental /

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40324: [SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40324: URL: https://github.com/apache/spark/pull/40324#discussion_r1136451520 ## docs/spark-connect-overview.md: ## @@ -0,0 +1,259 @@ +--- +layout: global +title: Spark Connect Overview +license: | + Licensed to the Apache Software

[GitHub] [spark] ulysses-you commented on pull request #40417: [SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased

2023-03-14 Thread via GitHub
ulysses-you commented on PR #40417: URL: https://github.com/apache/spark/pull/40417#issuecomment-1469168690 If people build a custom shuffle exchange which support both row and columnar output. Logically, we have `ShuffleExchangeLike` for developers. -- This is an automated message from

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136447424 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala: ## @@ -106,6 +107,19 @@ object

[GitHub] [spark] HyukjinKwon closed pull request #40426: [SPARK-42796][SQL] Support accessing TimestampNTZ columns in CachedBatch

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40426: [SPARK-42796][SQL] Support accessing TimestampNTZ columns in CachedBatch URL: https://github.com/apache/spark/pull/40426 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40426: [SPARK-42796][SQL] Support accessing TimestampNTZ columns in CachedBatch

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40426: URL: https://github.com/apache/spark/pull/40426#issuecomment-1469160585 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40417: [SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased

2023-03-14 Thread via GitHub
cloud-fan commented on PR #40417: URL: https://github.com/apache/spark/pull/40417#issuecomment-1469155631 > actually, this change only affects developers. Can we be a bit more specific? I think it's a problem for table cache because it supports both row and columnar output. But

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136441006 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +210,36 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136440921 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +210,36 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136440774 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +210,36 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136440410 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -130,57 +130,33 @@ private[hive] case class HiveGenericUDF( name: String,

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136439396 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala: ## @@ -97,6 +101,87 @@ object

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-14 Thread via GitHub
dongjoon-hyun commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136438686 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136437829 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala: ## @@ -97,6 +101,87 @@ object

[GitHub] [spark] beliefer commented on a diff in pull request #40355: [SPARK-42604][CONNECT] Implement functions.typedlit

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40355: URL: https://github.com/apache/spark/pull/40355#discussion_r1136437153 ## connector/connect/common/src/main/scala/org/apache/spark/sql/connect/common/LiteralValueProtoConverter.scala: ## @@ -97,6 +101,87 @@ object

[GitHub] [spark] beliefer commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136433968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala: ## @@ -84,11 +84,11 @@ object InferWindowGroupLimit extends

[GitHub] [spark] ulysses-you commented on pull request #40417: [SPARK-42778][SQL][3.4] QueryStageExec should respect supportsRowBased

2023-03-14 Thread via GitHub
ulysses-you commented on PR #40417: URL: https://github.com/apache/spark/pull/40417#issuecomment-1469117434 @dongjoon-hyun The reason is that we only support table cache query stage at master branch, so that test case is not valid. For branch-3.4, actually, this change only affects

[GitHub] [spark] chenhao-db opened a new pull request, #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

2023-03-14 Thread via GitHub
chenhao-db opened a new pull request, #40429: URL: https://github.com/apache/spark/pull/40429 ### What changes were proposed in this pull request? This PR fixed the counter-intuitive behaviors of the `ApproximatePercentile` expression mentioned in

[GitHub] [spark] beliefer commented on a diff in pull request #40410: [SPARK-42783][SQL] Infer window group limit should run as late as possible

2023-03-14 Thread via GitHub
beliefer commented on code in PR #40410: URL: https://github.com/apache/spark/pull/40410#discussion_r1136432237 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala: ## @@ -84,6 +84,11 @@ class SparkOptimizer( PushPredicateThroughNonJoin,

[GitHub] [spark] panbingkun commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
panbingkun commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136429094 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +210,39 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on a diff in pull request #40394: [SPARK-42771][SQL] Refactor HiveGenericUDF

2023-03-14 Thread via GitHub
cloud-fan commented on code in PR #40394: URL: https://github.com/apache/spark/pull/40394#discussion_r1136424404 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala: ## @@ -235,6 +210,39 @@ private[hive] case class HiveGenericUDF( } } +class

[GitHub] [spark] cloud-fan commented on pull request #40396: [SPARK-42772][SQL] Change the default value of JDBC options about push down to true

2023-03-14 Thread via GitHub
cloud-fan commented on PR #40396: URL: https://github.com/apache/spark/pull/40396#issuecomment-1469076028 Yea, we should mention that they can set these options to false if query fails with JDBC errors. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on pull request #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-14 Thread via GitHub
cloud-fan commented on PR #40385: URL: https://github.com/apache/spark/pull/40385#issuecomment-1469073120 > we already only print each operator exactly once in the node details section What if more than one `ReusedExchange` referencing the same `Exchange`? Will we print the

[GitHub] [spark] allanf-db opened a new pull request, #40428: Grammatical improvements

2023-03-14 Thread via GitHub
allanf-db opened a new pull request, #40428: URL: https://github.com/apache/spark/pull/40428 ### What changes were proposed in this pull request? Grammatical improvements to the Spark Connect content as a follow-up on https://github.com/apache/spark/pull/40324/ ###

[GitHub] [spark] xinrong-meng commented on a diff in pull request #40388: [SPARK-42765][CONNECT][PYTHON] Enable importing `pandas_udf` from `pyspark.sql.connect.functions`

2023-03-14 Thread via GitHub
xinrong-meng commented on code in PR #40388: URL: https://github.com/apache/spark/pull/40388#discussion_r1136405407 ## python/pyspark/sql/connect/functions.py: ## @@ -54,6 +54,11 @@ from pyspark.sql import functions as pysparkfuncs from pyspark.sql.types import

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40282: URL: https://github.com/apache/spark/pull/40282#discussion_r1136402821 ## python/docs/source/development/errors.rst: ## @@ -0,0 +1,92 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor license

[GitHub] [spark] HyukjinKwon commented on pull request #40423: [SPARK-41775][PYTHON][FOLLOW-UP] Torch distributor multiple gpus per task

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40423: URL: https://github.com/apache/spark/pull/40423#issuecomment-1469047887 FYI: https://github.com/rithwik-db/spark/runs/12002967878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #40425: [SPARK-42794][SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40425: URL: https://github.com/apache/spark/pull/40425#issuecomment-1469048078 cc @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40424: URL: https://github.com/apache/spark/pull/40424#issuecomment-1469024165 ‍♂️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40424: URL: https://github.com/apache/spark/pull/40424#issuecomment-1469023182 BTW, `branch-3.4` GitHub CI failure is still under investigation independently. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1136370343 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +118,50 @@ def pandas_microsecond(s) -> ps.Series[np.int64]: # type: ignore[no-untyped-def def

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40420: URL: https://github.com/apache/spark/pull/40420#discussion_r1136370080 ## python/pyspark/pandas/datetimes.py: ## @@ -116,26 +118,50 @@ def pandas_microsecond(s) -> ps.Series[np.int64]: # type: ignore[no-untyped-def def

[GitHub] [spark] HyukjinKwon commented on pull request #40420: [SPARK-42617][PS] Support `isocalendar` from the pandas 2.0.0

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40420: URL: https://github.com/apache/spark/pull/40420#issuecomment-1469020494 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40424: URL: https://github.com/apache/spark/pull/40424#issuecomment-1469019919 Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags` URL: https://github.com/apache/spark/pull/40424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40424: URL: https://github.com/apache/spark/pull/40424#issuecomment-1469019282 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #40419: [SPARK-42789][SQL] Rewrite multiple GetJsonObjects to a JsonTuple if their json expressions are the same

2023-03-14 Thread via GitHub
wangyum commented on PR #40419: URL: https://github.com/apache/spark/pull/40419#issuecomment-1469019263 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on pull request #40424: [SPARK-42793][CONNECT] `connect` module requires `build_profile_flags`

2023-03-14 Thread via GitHub
dongjoon-hyun commented on PR #40424: URL: https://github.com/apache/spark/pull/40424#issuecomment-1469018169 cc @hvanhovell, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40326: [SPARK-42708][DOCS] Improve doc about protobuf java file can't be indexed.

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40326: URL: https://github.com/apache/spark/pull/40326#discussion_r1136365666 ## connector/protobuf/README.md: ## @@ -34,3 +34,17 @@ export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe The user-defined `protoc` binary files can be produced

[GitHub] [spark] HyukjinKwon commented on pull request #40338: [MINOR][PYTHON] Change TypeVar to private symbols

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40338: URL: https://github.com/apache/spark/pull/40338#issuecomment-1469012020 Apache Spark has a custom implementation that leverages the build (and resources) from forked repository. Mind checking if the workload is enabled in your fork

[GitHub] [spark] HyukjinKwon commented on pull request #40377: [SPARK-42757][CONNECT] Implement textFile for DataFrameReader

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40377: URL: https://github.com/apache/spark/pull/40377#issuecomment-1469010474 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40377: [SPARK-42757][CONNECT] Implement textFile for DataFrameReader

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40377: [SPARK-42757][CONNECT] Implement textFile for DataFrameReader URL: https://github.com/apache/spark/pull/40377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations URL: https://github.com/apache/spark/pull/40416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

2023-03-14 Thread via GitHub
HyukjinKwon commented on code in PR #40416: URL: https://github.com/apache/spark/pull/40416#discussion_r1136357256 ## docs/configuration.md: ## @@ -3138,6 +3138,69 @@ like shuffle, just replace "rpc" with "shuffle" in the property names except The default value for number of

[GitHub] [spark] HyukjinKwon commented on pull request #40416: [SPARK-42731][CONNECT][DOCS] Document Spark Connect configurations

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40416: URL: https://github.com/apache/spark/pull/40416#issuecomment-1469009905 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-14 Thread via GitHub
HyukjinKwon closed pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common` URL: https://github.com/apache/spark/pull/40097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #40097: [SPARK-42508][CONNECT][ML] Extract the common .ml classes to `mllib-common`

2023-03-14 Thread via GitHub
HyukjinKwon commented on PR #40097: URL: https://github.com/apache/spark/pull/40097#issuecomment-1469004142 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] anishshri-db commented on pull request #40427: [SPARK-42792][SS] Add support for WRITE_FLUSH_BYTES for RocksDB used in streaming stateful operators

2023-03-14 Thread via GitHub
anishshri-db commented on PR #40427: URL: https://github.com/apache/spark/pull/40427#issuecomment-1468968921 @HeartSaVioR - please take a look. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

  1   2   3   >