[GitHub] [spark] cloud-fan commented on pull request #39596: [SPARK-42084][SQL] Avoid leaking the qualified-access-only restriction

2023-01-17 Thread GitBox
cloud-fan commented on PR #39596: URL: https://github.com/apache/spark/pull/39596#issuecomment-1386402281 It's more tricky to do it at the plan level, as you need to define which operators are allowed after natural/using join to access the hidden columns, an you will hit the same "when to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39628: [SPARK-40264][ML] followup pydoc edits

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39628: URL: https://github.com/apache/spark/pull/39628#discussion_r1073041810 ## python/pyspark/ml/functions.py: ## @@ -637,6 +641,62 @@ def predict_columnar(x1: np.ndarray, x2: np.ndarray) -> Mapping[str, np.ndarray] # |[12.0, 13.0,

[GitHub] [spark] AmplabJenkins commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
AmplabJenkins commented on PR #39585: URL: https://github.com/apache/spark/pull/39585#issuecomment-1386394165 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng commented on pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)`, `count(col(*))`, `count(expr(*))`

2023-01-17 Thread GitBox
zhengruifeng commented on PR #39622: URL: https://github.com/apache/spark/pull/39622#issuecomment-1386390009 I checked that https://github.com/apache/spark/pull/39636 can resolve the `count(*)` issue in Spark Connect. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] zhengruifeng opened a new pull request, #39636: [WIP][SQL] Move conversion `COUNT(*) -> COUNT(1)` to Analyzer

2023-01-17 Thread GitBox
zhengruifeng opened a new pull request, #39636: URL: https://github.com/apache/spark/pull/39636 ### What changes were proposed in this pull request? Move conversion `COUNT(*) -> COUNT(1)` to Analyzer ### Why are the changes needed? ### Does this PR introduce

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39628: [SPARK-40264][ML] followup pydoc edits

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39628: URL: https://github.com/apache/spark/pull/39628#discussion_r1073023166 ## python/pyspark/ml/functions.py: ## @@ -486,6 +486,7 @@ def predict(inputs: np.ndarray) -> np.ndarray: data = np.arange(0, 1000,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39628: [SPARK-40264][ML] followup pydoc edits

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39628: URL: https://github.com/apache/spark/pull/39628#discussion_r1073022795 ## python/pyspark/ml/functions.py: ## @@ -486,6 +486,7 @@ def predict(inputs: np.ndarray) -> np.ndarray: data = np.arange(0, 1000,

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1073016908 ## python/pyspark/errors/exceptions.py: ## @@ -69,8 +75,248 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: See Also

[GitHub] [spark] itholic commented on a diff in pull request #39625: [SPARK-42066][SQL] The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-17 Thread GitBox
itholic commented on code in PR #39625: URL: https://github.com/apache/spark/pull/39625#discussion_r1073016243 ## core/src/main/resources/error/error-classes.json: ## @@ -1729,7 +1718,8 @@ }, "WITH_SUGGESTION" : { "message" : [ - "Consider to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1073014550 ## python/pyspark/errors/exceptions.py: ## @@ -69,8 +75,248 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: See Also

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1073012053 ## python/pyspark/errors/exceptions.py: ## @@ -69,8 +75,248 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: See Also

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1073012053 ## python/pyspark/errors/exceptions.py: ## @@ -69,8 +75,248 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: See Also

[GitHub] [spark] cloud-fan closed pull request #39630: [SPARK-42061][SQL] mark expression InvokeLike and ExternalMapToCatalyst stateful

2023-01-17 Thread GitBox
cloud-fan closed pull request #39630: [SPARK-42061][SQL] mark expression InvokeLike and ExternalMapToCatalyst stateful URL: https://github.com/apache/spark/pull/39630 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #39630: [SPARK-42061][SQL] mark expression InvokeLike and ExternalMapToCatalyst stateful

2023-01-17 Thread GitBox
cloud-fan commented on PR #39630: URL: https://github.com/apache/spark/pull/39630#issuecomment-1386352584 The failed python test is unrelated, I'm merging it to master, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1073010348 ## python/pyspark/errors/exceptions.py: ## @@ -69,8 +75,248 @@ def getMessageParameters(self) -> Optional[Dict[str, str]]: See Also

[GitHub] [spark] ulysses-you closed pull request #39624: [SPARK-42101][SQL] Wrap InMemoryTableScanExec with QueryStage

2023-01-17 Thread GitBox
ulysses-you closed pull request #39624: [SPARK-42101][SQL] Wrap InMemoryTableScanExec with QueryStage URL: https://github.com/apache/spark/pull/39624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HeartSaVioR closed pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-17 Thread GitBox
HeartSaVioR closed pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc URL: https://github.com/apache/spark/pull/39538 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HeartSaVioR commented on pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-17 Thread GitBox
HeartSaVioR commented on PR #39538: URL: https://github.com/apache/spark/pull/39538#issuecomment-1386335492 Thanks! Merging to master. (I don't think this needs CI verification as we saw green in prev commit and we just added empty lines.) -- This is an automated message from the Apache

[GitHub] [spark] HeartSaVioR commented on pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-17 Thread GitBox
HeartSaVioR commented on PR #39538: URL: https://github.com/apache/spark/pull/39538#issuecomment-1386334953 https://user-images.githubusercontent.com/1317309/213058760-d6e969ea-4a16-4808-8352-19c4eb64be95.png;> I just verified manually that the page shows the content correctly. --

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1072996633 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -20,10 +20,34 @@ Errors == +Classes +--- + +.. currentmodule:: pyspark.errors + +.. autosummary::

[GitHub] [spark] zhenlineo closed pull request #39635: [SPARK-41822][Connect]Run Scala client tests

2023-01-17 Thread GitBox
zhenlineo closed pull request #39635: [SPARK-41822][Connect]Run Scala client tests URL: https://github.com/apache/spark/pull/39635 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhenlineo opened a new pull request, #39635: [SPARK-41822][Connect]Run Scala client tests

2023-01-17 Thread GitBox
zhenlineo opened a new pull request, #39635: URL: https://github.com/apache/spark/pull/39635 ### What changes were proposed in this pull request? Run the scala client tests as part of the build. ### Why are the changes needed? To ensure the correctness of the

[GitHub] [spark] github-actions[bot] commented on pull request #37915: [SPARK-40465][SQL] Refactor Decimal so as we can use other underlying implementation

2023-01-17 Thread GitBox
github-actions[bot] commented on PR #37915: URL: https://github.com/apache/spark/pull/37915#issuecomment-1386275505 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #37819: [SPARK-40377][SQL] Allow customize maxBroadcastTableBytes

2023-01-17 Thread GitBox
github-actions[bot] commented on PR #37819: URL: https://github.com/apache/spark/pull/37819#issuecomment-1386275532 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37641: [SPARK-40201][SQL][TESTS] Improve v1 write test coverage

2023-01-17 Thread GitBox
github-actions[bot] closed pull request #37641: [SPARK-40201][SQL][TESTS] Improve v1 write test coverage URL: https://github.com/apache/spark/pull/37641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] akpatnam25 commented on pull request #39634: SPARK-41415/SPARK-42090 Backport to 3.3

2023-01-17 Thread GitBox
akpatnam25 commented on PR #39634: URL: https://github.com/apache/spark/pull/39634#issuecomment-1386256889 @mridulm @otterc @tedyu @dongjoon-hyun backport into 3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] akpatnam25 opened a new pull request, #39634: SPARK-41415/SPARK-42090 Backport to 3.3

2023-01-17 Thread GitBox
akpatnam25 opened a new pull request, #39634: URL: https://github.com/apache/spark/pull/39634 ### What changes were proposed in this pull request? Add the ability to retry SASL requests. Will add it as a metric too soon to track SASL retries. ### Why are the changes needed?

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
HyukjinKwon commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1072954033 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -20,10 +20,34 @@ Errors == +Classes +--- + +.. currentmodule:: pyspark.errors + +..

[GitHub] [spark] sunchao opened a new pull request, #39633: [WIP][SPARK-42038][SQL] SPJ: Support partially clustered distribution

2023-01-17 Thread GitBox
sunchao opened a new pull request, #39633: URL: https://github.com/apache/spark/pull/39633 ### What changes were proposed in this pull request? Currently with [storage-partitioned

[GitHub] [spark] jerrypeng commented on pull request #39538: [SPARK-41596][SS][DOCS] Document the new feature "Async Progress Tracking" to Structured Streaming guide doc

2023-01-17 Thread GitBox
jerrypeng commented on PR #39538: URL: https://github.com/apache/spark/pull/39538#issuecomment-1386255861 @HeartSaVioR PTAL! Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] akpatnam25 commented on pull request #39632: SPARK-41415/SPARK-42090 Backport to 3.2

2023-01-17 Thread GitBox
akpatnam25 commented on PR #39632: URL: https://github.com/apache/spark/pull/39632#issuecomment-1386252421 @dongjoon-hyun oops tagged the wrong person :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] akpatnam25 commented on pull request #39632: SPARK-41415/SPARK-42090 Backport to 3.2

2023-01-17 Thread GitBox
akpatnam25 commented on PR #39632: URL: https://github.com/apache/spark/pull/39632#issuecomment-1386249238 + CC @otterc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] akpatnam25 commented on pull request #39632: SPARK-41415/SPARK-42090 Backport to 3.2

2023-01-17 Thread GitBox
akpatnam25 commented on PR #39632: URL: https://github.com/apache/spark/pull/39632#issuecomment-1386249063 @mridulm @Dooyoung-Hwang @tedyu backport into 3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] akpatnam25 opened a new pull request, #39632: Spark 41415 spark 42090 backport

2023-01-17 Thread GitBox
akpatnam25 opened a new pull request, #39632: URL: https://github.com/apache/spark/pull/39632 ### What changes were proposed in this pull request? Add the ability to retry SASL requests. Will add it as a metric too soon to track SASL retries. ### Why are the changes needed?

[GitHub] [spark] AmplabJenkins commented on pull request #39595: [SPARK-38230][SQL] InsertIntoHadoopFsRelationCommand unnecessarily fetches details of partitions in most cases

2023-01-17 Thread GitBox
AmplabJenkins commented on PR #39595: URL: https://github.com/apache/spark/pull/39595#issuecomment-1386239988 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] rithwik-db commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
rithwik-db commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072943054 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] akpatnam25 closed pull request #39631: Spark 41415

2023-01-17 Thread GitBox
akpatnam25 closed pull request #39631: Spark 41415 URL: https://github.com/apache/spark/pull/39631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] akpatnam25 opened a new pull request, #39631: Spark 41415

2023-01-17 Thread GitBox
akpatnam25 opened a new pull request, #39631: URL: https://github.com/apache/spark/pull/39631 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] gengliangwang commented on pull request #39596: [SPARK-42084][SQL] Avoid leaking the qualified-access-only restriction

2023-01-17 Thread GitBox
gengliangwang commented on PR #39596: URL: https://github.com/apache/spark/pull/39596#issuecomment-1386230467 https://user-images.githubusercontent.com/1097932/213033693-8baa37b2-349b-4018-9234-84d8579d60c0.png;>

[GitHub] [spark] lzlfred opened a new pull request, #39630: [SPARK-42061] mark expression InvokeLike and ExternalMapToCatalyst stateful

2023-01-17 Thread GitBox
lzlfred opened a new pull request, #39630: URL: https://github.com/apache/spark/pull/39630 ## What changes were proposed in this pull request? Those two expression involves Array/Buffer that are not thread-safe. Need to mark those stateful so existing Spark infra can copy those

[GitHub] [spark] rithwik-db opened a new pull request, #39629: [SPARK-42103][PYSPARK][ML] Added Instrumentation for PyTorch Distributor

2023-01-17 Thread GitBox
rithwik-db opened a new pull request, #39629: URL: https://github.com/apache/spark/pull/39629 If you want to look at the actual relevant changes for this PR, check out the latest commit. ### What changes were proposed in this pull request? Added instrumentation

[GitHub] [spark] vinodkc commented on pull request #39577: [SPARK-42070][SQL] Change the default value of argument of Mask udf from -1 to NULL

2023-01-17 Thread GitBox
vinodkc commented on PR #39577: URL: https://github.com/apache/spark/pull/39577#issuecomment-1386082418 @srowen , I updated both [SPARK-42070](https://issues.apache.org/jira/browse/SPARK-42070) and [SPARK-40687](https://issues.apache.org/jira/browse/SPARK-40687) with the proposed change

[GitHub] [spark] AmplabJenkins commented on pull request #39605: when spark job had ran in k8s is finished ,it register to shutdown ho…

2023-01-17 Thread GitBox
AmplabJenkins commented on PR #39605: URL: https://github.com/apache/spark/pull/39605#issuecomment-1386075621 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #39608: [SPARK-41896][SQL][TESTS] Additional tests for _metadata filters

2023-01-17 Thread GitBox
AmplabJenkins commented on PR #39608: URL: https://github.com/apache/spark/pull/39608#issuecomment-1386075572 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] leewyang commented on pull request #39628: [SPARK-40264][ML] followup pydoc edits

2023-01-17 Thread GitBox
leewyang commented on PR #39628: URL: https://github.com/apache/spark/pull/39628#issuecomment-1386053573 @WeichenXu123 Here's the followup PR w/ additional examples in the pydoc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] srielau commented on a diff in pull request #39498: [SPARK-41976][SQL] Improve error message for `INDEX_NOT_FOUND`

2023-01-17 Thread GitBox
srielau commented on code in PR #39498: URL: https://github.com/apache/spark/pull/39498#discussion_r1072729600 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/PostgresDialect.scala: ## @@ -224,7 +224,11 @@ private object PostgresDialect extends JdbcDialect with

[GitHub] [spark] dongjoon-hyun closed pull request #39540: [SPARK-42039][SQL] SPJ: Remove Option in KeyGroupedPartitioning#partitionValuesOpt

2023-01-17 Thread GitBox
dongjoon-hyun closed pull request #39540: [SPARK-42039][SQL] SPJ: Remove Option in KeyGroupedPartitioning#partitionValuesOpt URL: https://github.com/apache/spark/pull/39540 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] sunchao commented on pull request #39540: [SPARK-42039][SQL] SPJ: Remove Option in KeyGroupedPartitioning#partitionValuesOpt

2023-01-17 Thread GitBox
sunchao commented on PR #39540: URL: https://github.com/apache/spark/pull/39540#issuecomment-1385938829 cc @cloud-fan @dongjoon-hyun @viirya this is a small refactoring -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] leewyang opened a new pull request, #39628: [SPARK-40264][ML] followup pydoc edits

2023-01-17 Thread GitBox
leewyang opened a new pull request, #39628: URL: https://github.com/apache/spark/pull/39628 ### What changes were proposed in this pull request? Followup edits to pydoc from #37734 per [request](https://github.com/apache/spark/pull/37734#issuecomment-1384092385) ###

[GitHub] [spark] dongjoon-hyun closed pull request #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-17 Thread GitBox
dongjoon-hyun closed pull request #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders URL: https://github.com/apache/spark/pull/39627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-17 Thread GitBox
dongjoon-hyun commented on PR #39627: URL: https://github.com/apache/spark/pull/39627#issuecomment-1385891964 Merged to master back. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] jchen5 commented on pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION

2023-01-17 Thread GitBox
jchen5 commented on PR #39375: URL: https://github.com/apache/spark/pull/39375#issuecomment-1385887641 @allisonwang-db Thanks for the review, updated the PR with your comments! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dtenedor commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-17 Thread GitBox
dtenedor commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1072600184 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala: ## @@ -275,8 +276,37 @@ object LogicalPlanIntegrity { * Some plan

[GitHub] [spark] erenavsarogullari commented on a diff in pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-01-17 Thread GitBox
erenavsarogullari commented on code in PR #39037: URL: https://github.com/apache/spark/pull/39037#discussion_r1067611746 ## sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala: ## @@ -2693,6 +2694,21 @@ class AdaptiveQueryExecSuite

[GitHub] [spark] LuciferYang commented on pull request #39613: [SPARK-42092][BUILD] Upgrade RoaringBitmap to 0.9.38

2023-01-17 Thread GitBox
LuciferYang commented on PR #39613: URL: https://github.com/apache/spark/pull/39613#issuecomment-1385824418 Thanks @srowen @dongjoon-hyun @yaooqinn -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] srowen commented on pull request #39577: [SPARK-42070][SQL] Change the default value of argument of Mask udf from -1 to NULL

2023-01-17 Thread GitBox
srowen commented on PR #39577: URL: https://github.com/apache/spark/pull/39577#issuecomment-1385814117 OK, this was in https://issues.apache.org/jira/browse/SPARK-40686 - worth mentioning the connection in the JIRA maybe. CC @gengliangwang -- This is an automated message from the

[GitHub] [spark] dtenedor commented on pull request #38888: [SPARK-41405][SQL] Centralize the column resolution logic

2023-01-17 Thread GitBox
dtenedor commented on PR #3: URL: https://github.com/apache/spark/pull/3#issuecomment-1385808748 Sorry for missing this earlier, late LGTM. Changes like this are moving in a good direction to move analysis logic closer to one pass. Ideally we could e.g. start making

[GitHub] [spark] dtenedor commented on pull request #39577: [SPARK-42070][SQL] Change the default value of argument of Mask udf from -1 to NULL

2023-01-17 Thread GitBox
dtenedor commented on PR #39577: URL: https://github.com/apache/spark/pull/39577#issuecomment-1385802412 > This would be a breaking change though, right? not sure that is feasible @srowen technically this true. However, the previous PR to implement the `mask` function was merged

[GitHub] [spark] srowen closed pull request #39613: [SPARK-42092][BUILD] Upgrade RoaringBitmap to 0.9.38

2023-01-17 Thread GitBox
srowen closed pull request #39613: [SPARK-42092][BUILD] Upgrade RoaringBitmap to 0.9.38 URL: https://github.com/apache/spark/pull/39613 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on pull request #39613: [SPARK-42092][BUILD] Upgrade RoaringBitmap to 0.9.38

2023-01-17 Thread GitBox
srowen commented on PR #39613: URL: https://github.com/apache/spark/pull/39613#issuecomment-1385792389 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-17 Thread GitBox
dongjoon-hyun commented on code in PR #39627: URL: https://github.com/apache/spark/pull/39627#discussion_r1072552805 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala: ## @@ -458,4 +472,14 @@ class RowEncoderSuite extends

[GitHub] [spark] dongjoon-hyun commented on pull request #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-17 Thread GitBox
dongjoon-hyun commented on PR #39627: URL: https://github.com/apache/spark/pull/39627#issuecomment-1385774643 Thank you for making the PR again, @hvanhovell ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] srielau commented on a diff in pull request #39625: [SPARK-42066][SQL] The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-17 Thread GitBox
srielau commented on code in PR #39625: URL: https://github.com/apache/spark/pull/39625#discussion_r1072499199 ## core/src/main/resources/error/error-classes.json: ## @@ -1729,7 +1718,8 @@ }, "WITH_SUGGESTION" : { "message" : [ - "Consider to

[GitHub] [spark] xkrogen commented on a diff in pull request #39592: [SPARK-42081][SQL] Improve the plan change validation

2023-01-17 Thread GitBox
xkrogen commented on code in PR #39592: URL: https://github.com/apache/spark/pull/39592#discussion_r1072424979 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -304,6 +304,14 @@ object SQLConf { .stringConf .createOptional + val

[GitHub] [spark] hvanhovell opened a new pull request, #39627: [SPARK-41993][SQL] Move RowEncoder to AgnosticEncoders

2023-01-17 Thread GitBox
hvanhovell opened a new pull request, #39627: URL: https://github.com/apache/spark/pull/39627 ### What changes were proposed in this pull request? This PR makes `RowEncoder` produce an `AgnosticEncoder`. The expression generation for these encoders is moved to `ScalaReflection` (this

[GitHub] [spark] jchen5 commented on a diff in pull request #39375: [SPARK-36124][SQL] Support subqueries with correlation through UNION

2023-01-17 Thread GitBox
jchen5 commented on code in PR #39375: URL: https://github.com/apache/spark/pull/39375#discussion_r1072327019 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1209,6 +1209,12 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] xkrogen commented on pull request #36506: [SPARK-25050][SQL] Avro: writing complex unions

2023-01-17 Thread GitBox
xkrogen commented on PR #36506: URL: https://github.com/apache/spark/pull/36506#issuecomment-1385550180 @gengliangwang any comments on the latest diff, after @steven-aerts answered your last question? Seems that this PR is in a very healthy state, I would love to see it merged. -- This

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072309036 ## python/pyspark/ml/torch/distributor.py: ## @@ -325,8 +329,15 @@ def _create_torchrun_command( torchrun_args = ["--standalone", "--nnodes=1"]

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072300627 ## python/pyspark/ml/torch/distributor.py: ## @@ -325,8 +329,15 @@ def _create_torchrun_command( torchrun_args = ["--standalone", "--nnodes=1"]

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072294810 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072290872 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072289202 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072285042 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] peter-toth commented on pull request #29210: [SPARK-24497][SQL] Support recursive SQL query

2023-01-17 Thread GitBox
peter-toth commented on PR #29210: URL: https://github.com/apache/spark/pull/29210#issuecomment-1385500216 Sorry guys, this is unlikely to land in Spark 3.4, maybe in 3.5... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072267585 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072267585 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #39267: [WIP][SPARK-41592][PYTHON][ML] Pytorch file Distributed Training

2023-01-17 Thread GitBox
WeichenXu123 commented on code in PR #39267: URL: https://github.com/apache/spark/pull/39267#discussion_r1072266412 ## python/pyspark/ml/torch/distributor.py: ## @@ -428,6 +432,84 @@ def _run_local_training( return output +def _get_spark_task_program( +

[GitHub] [spark] lival opened a new pull request, #39626: An automatic caching solution for Spark

2023-01-17 Thread GitBox
lival opened a new pull request, #39626: URL: https://github.com/apache/spark/pull/39626 Hi, thanks for your attention. Caching is widely used by developers to improve performance. Reasonable use of cache APIs to improve performance presents a huge challenge to users . Many

[GitHub] [spark] hvanhovell commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
hvanhovell commented on PR #39585: URL: https://github.com/apache/spark/pull/39585#issuecomment-1385457773 Can we make the python specific stuff a binary blob instead of an actual message? That way you have more flexibility in the language specific bits, for scala for example we are

[GitHub] [spark] peter-toth commented on pull request #37525: [WIP][SPARK-40086][SPARK-42049][SQL] Improve AliasAwareOutputPartitioning and AliasAwareQueryOutputOrdering to take all aliases into accou

2023-01-17 Thread GitBox
peter-toth commented on PR #37525: URL: https://github.com/apache/spark/pull/37525#issuecomment-1385458033 @ulysses-you, @cloud-fan, I've rebased this PR on top of `master`, that now includes `multiTransform()`: - The 1st commit of this PR is a cherry-pick of the first commit from

[GitHub] [spark] peter-toth commented on pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
peter-toth commented on PR #38034: URL: https://github.com/apache/spark/pull/38034#issuecomment-1385415483 Thanks for the review @cloud-fan, @ulysses-you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] cloud-fan closed pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
cloud-fan closed pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives URL: https://github.com/apache/spark/pull/38034 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
cloud-fan commented on PR #38034: URL: https://github.com/apache/spark/pull/38034#issuecomment-1385387495 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] grundprinzip commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
grundprinzip commented on PR #39585: URL: https://github.com/apache/spark/pull/39585#issuecomment-1385376135 > As for the message `PythonFunction`, it was a placeholder for all the information required to construct a PySpark SimplePythonFunction, as shown below. > > ```

[GitHub] [spark] xinrong-meng commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
xinrong-meng commented on PR #39585: URL: https://github.com/apache/spark/pull/39585#issuecomment-1385357961 Thanks @grundprinzip for the insightful comments. I will adjust them. As for the message `PythonFunction`, it was a placeholder for all the information required to construct a

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)`, `count(col(*))`, `count(expr(*))`

2023-01-17 Thread GitBox
zhengruifeng commented on code in PR #39622: URL: https://github.com/apache/spark/pull/39622#discussion_r1072137756 ## python/pyspark/sql/connect/functions.py: ## @@ -799,6 +799,13 @@ def corr(col1: "ColumnOrName", col2: "ColumnOrName") -> Column: def count(col:

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1072085481 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -27,3 +27,14 @@ Errors PySparkException.getErrorClass PySparkException.getMessageParameters +

[GitHub] [spark] itholic opened a new pull request, #39625: [WIP][SPARK-42066][SQL] The DATATYPE_MISMATCH error class contains inappropriate and duplicating subclasses

2023-01-17 Thread GitBox
itholic opened a new pull request, #39625: URL: https://github.com/apache/spark/pull/39625 ### What changes were proposed in this pull request? This PR proposes to remove `DATATYPE_MISMATCH.WRONG_NUM_ARGS` and `DATATYPE_MISMATCH.WRONG_NUM_ARGS_WITH_SUGGESTION`from sub-class of

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1072068522 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -2210,15 +2210,39 @@ def test_call_udf(self): ).toPandas(), ) +def

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1072064789 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -44,6 +44,8 @@ message Expression { UnresolvedExtractValue

[GitHub] [spark] grundprinzip commented on a diff in pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

2023-01-17 Thread GitBox
grundprinzip commented on code in PR #39585: URL: https://github.com/apache/spark/pull/39585#discussion_r1072064530 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -217,6 +219,19 @@ message Expression { bool is_user_defined_function =

[GitHub] [spark] itholic commented on a diff in pull request #39591: [SPARK-42078][PYTHON] Migrate errors thrown by JVM into `PySparkException`.

2023-01-17 Thread GitBox
itholic commented on code in PR #39591: URL: https://github.com/apache/spark/pull/39591#discussion_r1072058043 ## python/docs/source/reference/pyspark.errors.rst: ## @@ -27,3 +27,14 @@ Errors PySparkException.getErrorClass PySparkException.getMessageParameters +

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)`, `count(col(*))`, `count(expr(*))`

2023-01-17 Thread GitBox
zhengruifeng commented on code in PR #39622: URL: https://github.com/apache/spark/pull/39622#discussion_r1072052805 ## python/pyspark/sql/connect/functions.py: ## @@ -799,6 +799,13 @@ def corr(col1: "ColumnOrName", col2: "ColumnOrName") -> Column: def count(col:

[GitHub] [spark] ulysses-you closed pull request #39624: [SPARK-42101][SQL] Wrap InMemoryTableScanExec with QueryStage

2023-01-17 Thread GitBox
ulysses-you closed pull request #39624: [SPARK-42101][SQL] Wrap InMemoryTableScanExec with QueryStage URL: https://github.com/apache/spark/pull/39624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you opened a new pull request, #39624: [SPARK-42101][SQL] Wrap InMemoryTableScanExec with QueryStage

2023-01-17 Thread GitBox
ulysses-you opened a new pull request, #39624: URL: https://github.com/apache/spark/pull/39624 ### What changes were proposed in this pull request? This pr aims to enhance the `InMemoryTableScanExec` and `AdaptiveSparkPlanExec`. The first access to the cached plan is

[GitHub] [spark] ulysses-you commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
ulysses-you commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1072014626 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,212 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

[GitHub] [spark] peter-toth commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
peter-toth commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1072009769 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,134 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

[GitHub] [spark] zhengruifeng commented on a diff in pull request #39622: [SPARK-42099][SPARK-41845][CONNECT][PYTHON] Fix `count(*)`, `count(col(*))`, `count(expr(*))`

2023-01-17 Thread GitBox
zhengruifeng commented on code in PR #39622: URL: https://github.com/apache/spark/pull/39622#discussion_r1072010050 ## python/pyspark/sql/connect/functions.py: ## @@ -799,6 +799,13 @@ def corr(col1: "ColumnOrName", col2: "ColumnOrName") -> Column: def count(col:

[GitHub] [spark] cloud-fan commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
cloud-fan commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1071999550 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,212 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

[GitHub] [spark] cloud-fan commented on a diff in pull request #38034: [SPARK-40599][SQL] Add multiTransform methods to TreeNode to generate alternatives

2023-01-17 Thread GitBox
cloud-fan commented on code in PR #38034: URL: https://github.com/apache/spark/pull/38034#discussion_r1071999550 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala: ## @@ -618,6 +618,212 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]]

<    2   3   4   5   6   7   8   9   10   11   >