[GitHub] [spark] zhengruifeng commented on a diff in pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-21 Thread GitBox
zhengruifeng commented on code in PR #36246: URL: https://github.com/apache/spark/pull/36246#discussion_r855817585 ## python/pyspark/pandas/series.py: ## @@ -2209,15 +2219,43 @@ def _interpolate( ) * null_index_forward + last_non_null_forward fill_cond = ~F.i

[GitHub] [spark] panbingkun commented on a diff in pull request #36314: [SPARK-38736][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-04-21 Thread GitBox
panbingkun commented on code in PR #36314: URL: https://github.com/apache/spark/pull/36314#discussion_r855814432 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala: ## @@ -77,4 +77,23 @@ class QueryExecutionAnsiErrorsSuite extends QueryTes

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1106063878 > Why not? spark-tags-tests.jar is used in maven test phase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [spark] MaxGekk commented on pull request #35269: [SPARK-28516][SQL] Data Type Formatting Functions: `to_char`

2022-04-21 Thread GitBox
MaxGekk commented on PR #35269: URL: https://github.com/apache/spark/pull/35269#issuecomment-1106047562 FYI, the feature is in the allow list for Spark 3.3, and in fact 3.3 is waiting for only this PR. @cloud-fan @dtenedor @amaliujia @beliefer How long could it take to be ready for merging

[GitHub] [spark] MaxGekk commented on pull request #36232: [SPARK-38741][SQL][TESTS] Test the error class: MAP_KEY_DOES_NOT_EXIST*

2022-04-21 Thread GitBox
MaxGekk commented on PR #36232: URL: https://github.com/apache/spark/pull/36232#issuecomment-1106041088 @panbingkun Could you resolve conflicts, here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on a diff in pull request #36280: [SPARK-38742][SQL][TESTS] Move the tests `MISSING_COLUMN` from SQLQuerySuite to QueryCompilationErrorsSuite

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36280: URL: https://github.com/apache/spark/pull/36280#discussion_r855792086 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -362,6 +362,45 @@ class QueryCompilationErrorsSuite extends QueryTest wit

[GitHub] [spark] MaxGekk commented on a diff in pull request #36284: [SPARK-38750][SQL][TESTS] Test the error class: SECOND_FUNCTION_ARGUMENT_NOT_INTEGER

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36284: URL: https://github.com/apache/spark/pull/36284#discussion_r855791244 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -362,6 +362,19 @@ class QueryCompilationErrorsSuite extends QueryTest wit

[GitHub] [spark] MaxGekk commented on a diff in pull request #36298: [SPARK-38740][SQL][TESTS] Test the error class: INVALID_JSON_SCHEMA_MAPTYPE

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36298: URL: https://github.com/apache/spark/pull/36298#discussion_r855790900 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryCompilationErrorsSuite.scala: ## @@ -364,6 +364,21 @@ class QueryCompilationErrorsSuite extends QueryTest wit

[GitHub] [spark] MaxGekk commented on a diff in pull request #36314: [SPARK-38736][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36314: URL: https://github.com/apache/spark/pull/36314#discussion_r855790606 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionAnsiErrorsSuite.scala: ## @@ -77,4 +77,23 @@ class QueryExecutionAnsiErrorsSuite extends QueryTest w

[GitHub] [spark] MaxGekk commented on a diff in pull request #36320: [SPARK-38732][SQL][TESTS] Test the error class: INCOMPARABLE_PIVOT_COLUMN

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36320: URL: https://github.com/apache/spark/pull/36320#discussion_r855790359 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -373,4 +373,29 @@ class QueryExecutionErrorsSuite extends QueryTest as

[GitHub] [spark] gengliangwang closed pull request #36316: [SPARK-38813][3.3][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output

2022-04-21 Thread GitBox
gengliangwang closed pull request #36316: [SPARK-38813][3.3][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output URL: https://github.com/apache/spark/pull/36316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] HyukjinKwon closed pull request #36318: [SPARK-38994][DOCS] Add an Python example of StreamingQueryListener

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36318: [SPARK-38994][DOCS] Add an Python example of StreamingQueryListener URL: https://github.com/apache/spark/pull/36318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang commented on pull request #36316: [SPARK-38813][3.3][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output

2022-04-21 Thread GitBox
gengliangwang commented on PR #36316: URL: https://github.com/apache/spark/pull/36316#issuecomment-1106027911 @cloud-fan @dongjoon-hyun @ueshin Thanks for the review Merging to 3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] HyukjinKwon commented on pull request #36318: [SPARK-38994][DOCS] Add an Python example of StreamingQueryListener

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36318: URL: https://github.com/apache/spark/pull/36318#issuecomment-1106027550 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] MaxGekk closed pull request #36287: [SPARK-38986][SQL] Prepend error class tag to error messages

2022-04-21 Thread GitBox
MaxGekk closed pull request #36287: [SPARK-38986][SQL] Prepend error class tag to error messages URL: https://github.com/apache/spark/pull/36287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] MaxGekk commented on pull request #36287: [SPARK-38986][SQL] Prepend error class tag to error messages

2022-04-21 Thread GitBox
MaxGekk commented on PR #36287: URL: https://github.com/apache/spark/pull/36287#issuecomment-1106015680 GA passed. Merging to master. Thank you, @cloud-fan and @HyukjinKwon for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] MaxGekk commented on a diff in pull request #36287: [SPARK-38986][SQL] Prepend error class tag to error messages

2022-04-21 Thread GitBox
MaxGekk commented on code in PR #36287: URL: https://github.com/apache/spark/pull/36287#discussion_r855770217 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -58,7 +58,8 @@ private[spark] object SparkThrowableHelper { def getMessage(errorClass: String, messageP

[GitHub] [spark] lvshaokang opened a new pull request, #36320: [SPARK-38732][SQL][TESTS] Test the error class: INCOMPARABLE_PIVOT_COLUMN

2022-04-21 Thread GitBox
lvshaokang opened a new pull request, #36320: URL: https://github.com/apache/spark/pull/36320 ### What changes were proposed in this pull request? I add a test case for the error class INCOMPARABLE_PIVOT_COLUMN in the QueryExecutionErrorsSuite. ### Why are the changes n

[GitHub] [spark] beliefer opened a new pull request, #36319: [SPARK-28330][SQL][FOLLOWUP] Move the check of Offset from analysis to finsh analysis

2022-04-21 Thread GitBox
beliefer opened a new pull request, #36319: URL: https://github.com/apache/spark/pull/36319 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/35975 supported ANSI SQL: result offset clause. We make some check for offset in `CheckAnalysis`. The

[GitHub] [spark] HyukjinKwon commented on pull request #36318: [SPARK-38994][DOCS] Add an Python example of StreamingQueryListener

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36318: URL: https://github.com/apache/spark/pull/36318#issuecomment-110633 cc @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon opened a new pull request, #36318: [SPARK-38994][DOCS] Add an Python example of StreamingQueryListener

2022-04-21 Thread GitBox
HyukjinKwon opened a new pull request, #36318: URL: https://github.com/apache/spark/pull/36318 ### What changes were proposed in this pull request? This PR proposes to add an example of `StreamingQueryListener` in Python addd in SPARK-38759. ### Why are the changes needed?

[GitHub] [spark] itholic commented on pull request #36306: [SPARK-34827][PYTHON] Implement `ignore_index` of `DataFrame/Series.sample`

2022-04-21 Thread GitBox
itholic commented on PR #36306: URL: https://github.com/apache/spark/pull/36306#issuecomment-1105993698 Maybe the JIRA number is incorrect ?? Seems like it should be SPARK-38989. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] JoshRosen commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
JoshRosen commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855751881 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
HyukjinKwon commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855751736 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider

[GitHub] [spark] JoshRosen commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
JoshRosen commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855748988 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] JoshRosen commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
JoshRosen commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855748988 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] dongjoon-hyun commented on pull request #36316: [SPARK-38813][3.3][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output

2022-04-21 Thread GitBox
dongjoon-hyun commented on PR #36316: URL: https://github.com/apache/spark/pull/36316#issuecomment-1105983522 Got it, @gengliangwang . Never mind~ :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] gengliangwang commented on pull request #36316: [SPARK-38813][3.3][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output

2022-04-21 Thread GitBox
gengliangwang commented on PR #36316: URL: https://github.com/apache/spark/pull/36316#issuecomment-1105982873 @dongjoon-hyun Yes I would love to. There is a check `!Utils.isTesting`. If I temporarily disable it in one test case, will it be a potential issue for running tests in parallel?

[GitHub] [spark] srowen commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
srowen commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105981447 Why not? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [spark] JoshRosen commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
JoshRosen commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855748988 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] JoshRosen commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
JoshRosen commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855748988 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -36,10 +36,12 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] gengliangwang commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
gengliangwang commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855746240 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -38,8 +38,10 @@ private[spark] class ShellBasedGroupsMappingProvider

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
HyukjinKwon commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855748245 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -38,8 +38,10 @@ private[spark] class ShellBasedGroupsMappingProvider ex

[GitHub] [spark] lvshaokang closed pull request #36297: [SPARK-38732][SQL][TESTS] Test the error class: INCOMPARABLE_PIVOT_COLUMN

2022-04-21 Thread GitBox
lvshaokang closed pull request #36297: [SPARK-38732][SQL][TESTS] Test the error class: INCOMPARABLE_PIVOT_COLUMN URL: https://github.com/apache/spark/pull/36297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [spark] gengliangwang commented on a diff in pull request #36315: [SPARK-38992][CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
gengliangwang commented on code in PR #36315: URL: https://github.com/apache/spark/pull/36315#discussion_r855746240 ## core/src/main/scala/org/apache/spark/security/ShellBasedGroupsMappingProvider.scala: ## @@ -38,8 +38,10 @@ private[spark] class ShellBasedGroupsMappingProvider

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105973624 > This does not explain why? > > This does not explain why? > > Thank you very much for your reply!After spark build is completed, the $spark_project_home/assembly/tar

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105973109 > This does not explain why? Thank you very much for your reply!After spark build is completed, the $spark_project_home/assembly/target/scala-2.11/jars directory contains

[GitHub] [spark] HyukjinKwon commented on pull request #36312: [SPARK-38990][SQL] Avoid `NullPointerException` when evaluating date_trunc/trunc format as a bound reference

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36312: URL: https://github.com/apache/spark/pull/36312#issuecomment-1105968096 Merged to master, branch-3.3, branch-3.2, branch-3.1 and branch-3.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] HyukjinKwon closed pull request #36312: [SPARK-38990][SQL] Avoid `NullPointerException` when evaluating date_trunc/trunc format as a bound reference

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36312: [SPARK-38990][SQL] Avoid `NullPointerException` when evaluating date_trunc/trunc format as a bound reference URL: https://github.com/apache/spark/pull/36312 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105965230 > @dongjoon-hyun Could you take a look? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan closed pull request #36291: [SPARK-38974][SQL] Filter registered functions with a given database name in list functions

2022-04-21 Thread GitBox
cloud-fan closed pull request #36291: [SPARK-38974][SQL] Filter registered functions with a given database name in list functions URL: https://github.com/apache/spark/pull/36291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #36291: [SPARK-38974][SQL] Filter registered functions with a given database name in list functions

2022-04-21 Thread GitBox
cloud-fan commented on PR #36291: URL: https://github.com/apache/spark/pull/36291#issuecomment-1105964564 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] zhengruifeng opened a new pull request, #36317: [SPARK-38993][ML] Impl DataFrame.boxplot and DataFrame.plot.box

2022-04-21 Thread GitBox
zhengruifeng opened a new pull request, #36317: URL: https://github.com/apache/spark/pull/36317 ### What changes were proposed in this pull request? Impl DataFrame.boxplot and DataFrame.plot.box ### Why are the changes needed? to increase pandas API coverage in PySpark

[GitHub] [spark] cloud-fan commented on a diff in pull request #36303: [SPARK-38977][SQL] Fix schema pruning with correlated subqueries

2022-04-21 Thread GitBox
cloud-fan commented on code in PR #36303: URL: https://github.com/apache/spark/pull/36303#discussion_r855736358 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala: ## @@ -935,4 +935,106 @@ abstract class SchemaPruningSuite .coun

[GitHub] [spark] gengliangwang opened a new pull request, #36316: [SPARK-38813][SQL][FOLLOWUP] Improve the analysis check for TimestampNTZ output

2022-04-21 Thread GitBox
gengliangwang opened a new pull request, #36316: URL: https://github.com/apache/spark/pull/36316 ### What changes were proposed in this pull request? In https://github.com/apache/spark/pull/36094, a check for failing TimestampNTZ output is added. However, if there is an unr

[GitHub] [spark] aray commented on pull request #36150: [WIP][SPARK-38864][SQL] Add melt / unpivot to Dataset

2022-04-21 Thread GitBox
aray commented on PR #36150: URL: https://github.com/apache/spark/pull/36150#issuecomment-1105961462 @EnricoMi thanks for the PR. This has been a TODO for years now since I added pivot. If you want to implement this with `stack` you can just use the expression directly, no need to add a fun

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105961386 > @dongjoon-hyun Could you take a look? thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] xiuzhu9527 closed pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 closed pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from… URL: https://github.com/apache/spark/pull/36217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36296: [SPARK-38979][SQL] Improve error log readability in OrcUtils.requestedColumnIds

2022-04-21 Thread GitBox
HyukjinKwon commented on code in PR #36296: URL: https://github.com/apache/spark/pull/36296#discussion_r855735028 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala: ## @@ -224,8 +224,10 @@ object OrcUtils extends Logging { // the p

[GitHub] [spark] cloud-fan commented on pull request #36072: [SPARK-38666][SQL] Add missing aggregate filter checks

2022-04-21 Thread GitBox
cloud-fan commented on PR #36072: URL: https://github.com/apache/spark/pull/36072#issuecomment-1105959980 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] HyukjinKwon closed pull request #36308: [SPARK-38581][PYTHON][DOCS][3.3] List of supported pandas APIs for pandas-on-Spark docs

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36308: [SPARK-38581][PYTHON][DOCS][3.3] List of supported pandas APIs for pandas-on-Spark docs URL: https://github.com/apache/spark/pull/36308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] cloud-fan closed pull request #36072: [SPARK-38666][SQL] Add missing aggregate filter checks

2022-04-21 Thread GitBox
cloud-fan closed pull request #36072: [SPARK-38666][SQL] Add missing aggregate filter checks URL: https://github.com/apache/spark/pull/36072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] HyukjinKwon commented on pull request #36315: [SPARK-38992][CORE]Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36315: URL: https://github.com/apache/spark/pull/36315#issuecomment-1105959105 cc @JoshRosen @gengliangwang mind taking a look please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] HyukjinKwon opened a new pull request, #36315: [SPARK-38992][CORE]Avoid using bash -c in ShellBasedGroupsMappingProvider

2022-04-21 Thread GitBox
HyukjinKwon opened a new pull request, #36315: URL: https://github.com/apache/spark/pull/36315 ### What changes were proposed in this pull request? This PR proposes to avoid using `bash -c` in `ShellBasedGroupsMappingProvider`. This could allow users a command injection. ### Wh

[GitHub] [spark] cloud-fan commented on a diff in pull request #36230: [SPARK-38868][SQL] Don't propagate exceptions from filter predicate when optimizing outer joins

2022-04-21 Thread GitBox
cloud-fan commented on code in PR #36230: URL: https://github.com/apache/spark/pull/36230#discussion_r855731336 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OuterJoinEliminationSuite.scala: ## @@ -252,4 +254,18 @@ class OuterJoinEliminationSuite extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #36094: [SPARK-38813][SQL][3.3] Remove TimestampNTZ type support in Spark 3.3

2022-04-21 Thread GitBox
gengliangwang commented on code in PR #36094: URL: https://github.com/apache/spark/pull/36094#discussion_r855731255 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -157,6 +158,10 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] cloud-fan commented on a diff in pull request #36230: [SPARK-38868][SQL] Don't propagate exceptions from filter predicate when optimizing outer joins

2022-04-21 Thread GitBox
cloud-fan commented on code in PR #36230: URL: https://github.com/apache/spark/pull/36230#discussion_r855731223 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -144,8 +144,17 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with P

[GitHub] [spark] cloud-fan commented on a diff in pull request #36094: [SPARK-38813][SQL][3.3] Remove TimestampNTZ type support in Spark 3.3

2022-04-21 Thread GitBox
cloud-fan commented on code in PR #36094: URL: https://github.com/apache/spark/pull/36094#discussion_r855730722 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -157,6 +158,10 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] cloud-fan commented on a diff in pull request #36094: [SPARK-38813][SQL][3.3] Remove TimestampNTZ type support in Spark 3.3

2022-04-21 Thread GitBox
cloud-fan commented on code in PR #36094: URL: https://github.com/apache/spark/pull/36094#discussion_r855730647 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -157,6 +158,10 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] srowen commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
srowen commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105953110 This does not explain why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [spark] itholic commented on a diff in pull request #36306: [SPARK-34827][PYTHON] Implement `ignore_index` of `DataFrame/Series.sample`

2022-04-21 Thread GitBox
itholic commented on code in PR #36306: URL: https://github.com/apache/spark/pull/36306#discussion_r855727915 ## python/pyspark/pandas/frame.py: ## @@ -8689,7 +8702,13 @@ def sample( sdf = self._internal.resolved_copy.spark_frame.sample( withReplacement=rep

[GitHub] [spark] PavithraRamachandran commented on pull request #36278: [SPARK-38963][WEBUI]Make stage navigable to stage Page from max metrics displayed in UI

2022-04-21 Thread GitBox
PavithraRamachandran commented on PR #36278: URL: https://github.com/apache/spark/pull/36278#issuecomment-1105950450 @martin-g @LuciferYang could u help review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] xiuzhu9527 commented on pull request #36217: [BUILD] When building spark project, remove spark-tags-tests.jar from…

2022-04-21 Thread GitBox
xiuzhu9527 commented on PR #36217: URL: https://github.com/apache/spark/pull/36217#issuecomment-1105948761 cc @dongjoon-hyun @HyukjinKwon @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon closed pull request #36215: [SPARK-38938][PYTHON] Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36215: [SPARK-38938][PYTHON] Implement `inplace` and `columns` parameters of `Series.drop` URL: https://github.com/apache/spark/pull/36215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] HyukjinKwon commented on pull request #36215: [SPARK-38938][PYTHON] Implement `inplace` and `columns` parameters of `Series.drop`

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36215: URL: https://github.com/apache/spark/pull/36215#issuecomment-1105948369 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #36266: [SPARK-38952][PYTHON] Implement `numeric_only` of `GroupBy.first` and `GroupBy.last`

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36266: [SPARK-38952][PYTHON] Implement `numeric_only` of `GroupBy.first` and `GroupBy.last` URL: https://github.com/apache/spark/pull/36266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #36266: [SPARK-38952][PYTHON] Implement `numeric_only` of `GroupBy.first` and `GroupBy.last`

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36266: URL: https://github.com/apache/spark/pull/36266#issuecomment-1105948075 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #36294: [SPARK-38955][SQL] Disable lineSep option in 'from_csv' and 'schema_of_csv'

2022-04-21 Thread GitBox
HyukjinKwon closed pull request #36294: [SPARK-38955][SQL] Disable lineSep option in 'from_csv' and 'schema_of_csv' URL: https://github.com/apache/spark/pull/36294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] HyukjinKwon commented on pull request #36294: [SPARK-38955][SQL] Disable lineSep option in 'from_csv' and 'schema_of_csv'

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36294: URL: https://github.com/apache/spark/pull/36294#issuecomment-1105947034 I will get this in first since 3.3 RC is coming soon. Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #36294: [SPARK-38955][SQL] Disable lineSep option in 'from_csv' and 'schema_of_csv'

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36294: URL: https://github.com/apache/spark/pull/36294#issuecomment-1105945488 Yeah. I actually think we should check all options there and document them. Actually, we might even throw an exception too like `parseMode` case but it might be too breaking. -- Th

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36287: [SPARK-38986][SQL] Prepend error class tag to error messages

2022-04-21 Thread GitBox
HyukjinKwon commented on code in PR #36287: URL: https://github.com/apache/spark/pull/36287#discussion_r855722459 ## core/src/main/scala/org/apache/spark/ErrorInfo.scala: ## @@ -58,7 +58,8 @@ private[spark] object SparkThrowableHelper { def getMessage(errorClass: String, mess

[GitHub] [spark] panbingkun opened a new pull request, #36314: [SPARK-38736][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-04-21 Thread GitBox
panbingkun opened a new pull request, #36314: URL: https://github.com/apache/spark/pull/36314 ## What changes were proposed in this pull request? This pr aims to add one test for the error class INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT to QueryExecutionAnsiErrorsSuite.

[GitHub] [spark] panbingkun closed pull request #36313: [SPARK-38734][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-04-21 Thread GitBox
panbingkun closed pull request #36313: [SPARK-38734][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT URL: https://github.com/apache/spark/pull/36313 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] HyukjinKwon commented on pull request #36308: [SPARK-38581][PYTHON][DOCS][3.3] List of supported pandas APIs for pandas-on-Spark docs

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36308: URL: https://github.com/apache/spark/pull/36308#issuecomment-1105941459 Merged to banch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] panbingkun opened a new pull request, #36313: [SPARK-38734][SQL][TESTS] Test the error classes: INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT

2022-04-21 Thread GitBox
panbingkun opened a new pull request, #36313: URL: https://github.com/apache/spark/pull/36313 ## What changes were proposed in this pull request? This pr aims to add one test for the error class INVALID_ARRAY_INDEX & INVALID_ARRAY_INDEX_IN_ELEMENT_AT to QueryExecutionAnsiErrorsSuite.

[GitHub] [spark] HyukjinKwon commented on pull request #36038: [SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-21 Thread GitBox
HyukjinKwon commented on PR #36038: URL: https://github.com/apache/spark/pull/36038#issuecomment-1105940874 oh yeah. will take a look. thanks for pointing this out! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] dongjoon-hyun commented on pull request #36310: [MINOR][DOCS] Also remove Google Analytics from Spark release docs, per ASF policy

2022-04-21 Thread GitBox
dongjoon-hyun commented on PR #36310: URL: https://github.com/apache/spark/pull/36310#issuecomment-1105940185 Merged to all live branches (master/3.3/3.2/3.1). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] dongjoon-hyun closed pull request #36310: [MINOR][DOCS] Also remove Google Analytics from Spark release docs, per ASF policy

2022-04-21 Thread GitBox
dongjoon-hyun closed pull request #36310: [MINOR][DOCS] Also remove Google Analytics from Spark release docs, per ASF policy URL: https://github.com/apache/spark/pull/36310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] bersprockets opened a new pull request, #36312: [SPARK-38990][SQL] Avoid `NullPointerException` when evaluating date_trunc/trunc format as a bound reference

2022-04-21 Thread GitBox
bersprockets opened a new pull request, #36312: URL: https://github.com/apache/spark/pull/36312 ### What changes were proposed in this pull request? Change `TruncInstant.evalHelper` to pass the input row to `format.eval` when `format` is a not a literal (and therefore might be a bound

[GitHub] [spark] c21 commented on pull request #36311: [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

2022-04-21 Thread GitBox
c21 commented on PR #36311: URL: https://github.com/apache/spark/pull/36311#issuecomment-1105909529 @cloud-fan and @huaxingao could you help take a look when you have time? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] c21 opened a new pull request, #36311: [SPARK-34960][SQL][DOCS][FOLLOWUP] Improve doc for DSv2 aggregate push down

2022-04-21 Thread GitBox
c21 opened a new pull request, #36311: URL: https://github.com/apache/spark/pull/36311 ### What changes were proposed in this pull request? This is a followup per comment in https://issues.apache.org/jira/browse/SPARK-34960, to improve the documentation for data source v2 agg

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36205: [SPARK-38907][PYTHON] Implement DataFrame.corrwith

2022-04-21 Thread GitBox
zhengruifeng commented on code in PR #36205: URL: https://github.com/apache/spark/pull/36205#discussion_r855689831 ## python/pyspark/pandas/frame.py: ## @@ -1310,6 +1310,137 @@ def corr(self, method: str = "pearson") -> "DataFrame": """ return cast(DataFrame, p

[GitHub] [spark] srowen opened a new pull request, #36310: [MINOR][DOCS] Also remove Google Analytics from Spark release docs, per ASF policy

2022-04-21 Thread GitBox
srowen opened a new pull request, #36310: URL: https://github.com/apache/spark/pull/36310 ### What changes were proposed in this pull request? Remove Google Analytics from Spark release docs. See also https://github.com/apache/spark-website/pull/384 ### Why are the changes ne

[GitHub] [spark] github-actions[bot] commented on pull request #34995: [SPARK-37722][SQL] Escape dot character in partition names

2022-04-21 Thread GitBox
github-actions[bot] commented on PR #34995: URL: https://github.com/apache/spark/pull/34995#issuecomment-1105878086 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] commented on pull request #35076: [SPARK-37793][CORE][SHUFFLE] Fallback to fetch original blocks when noLocalMergedBlockDataError

2022-04-21 Thread GitBox
github-actions[bot] commented on PR #35076: URL: https://github.com/apache/spark/pull/35076#issuecomment-1105878067 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #35140: [SPARK-37829][SQL] DataFrame.joinWith should return null rows for missing values

2022-04-21 Thread GitBox
github-actions[bot] closed pull request #35140: [SPARK-37829][SQL] DataFrame.joinWith should return null rows for missing values URL: https://github.com/apache/spark/pull/35140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] xinrong-databricks opened a new pull request, #36309: Groupby.bool stat

2022-04-21 Thread GitBox
xinrong-databricks opened a new pull request, #36309: URL: https://github.com/apache/spark/pull/36309 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] itholic commented on pull request #36083: [SPARK-38581][PYTHON][DOCS] List of supported pandas APIs for pandas-on-Spark docs.

2022-04-21 Thread GitBox
itholic commented on PR #36083: URL: https://github.com/apache/spark/pull/36083#issuecomment-1105857345 @HyukjinKwon Just created at https://github.com/apache/spark/pull/36308 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] itholic opened a new pull request, #36308: [SPARK-38581][PYTHON][DOCS][3.3] List of supported pandas APIs for pandas-on-Spark docs

2022-04-21 Thread GitBox
itholic opened a new pull request, #36308: URL: https://github.com/apache/spark/pull/36308 ### What changes were proposed in this pull request? This PR proposes to add new page named "Supported pandas APIs" for pandas-on-Spark documents. This is cherry-pick from https://github

[GitHub] [spark] srielau opened a new pull request, #36307: [SPARK-38985] sub error classes

2022-04-21 Thread GitBox
srielau opened a new pull request, #36307: URL: https://github.com/apache/spark/pull/36307 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] itholic commented on pull request #36267: [SPARK-38953][PYTHON][DOC] Document PySpark common exceptions / errors

2022-04-21 Thread GitBox
itholic commented on PR #36267: URL: https://github.com/apache/spark/pull/36267#issuecomment-1105833578 Would mind adding the screen-capture of the some part of page so that we can easily verify the page is rendered properly ?? -- This is an automated message from the Apache Git Service.

[GitHub] [spark] itholic commented on pull request #36205: [SPARK-38907][PYTHON] Implement DataFrame.corrwith

2022-04-21 Thread GitBox
itholic commented on PR #36205: URL: https://github.com/apache/spark/pull/36205#issuecomment-1105832399 Also same as https://github.com/apache/spark/pull/36246#issuecomment-1105830519. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] itholic commented on pull request #36246: [SPARK-38937][PYTHON] interpolate support param `limit_direction`

2022-04-21 Thread GitBox
itholic commented on PR #36246: URL: https://github.com/apache/spark/pull/36246#issuecomment-1105830519 Could you also update python/docs/source/user_guide/pandas_on_spark/supported_pandas_api.rst ? We should keep this list up-to-date manually for now when adding the new API or param

[GitHub] [spark] huaxingao commented on a diff in pull request #36303: [SPARK-38977][SQL] Fix schema pruning with correlated subqueries

2022-04-21 Thread GitBox
huaxingao commented on code in PR #36303: URL: https://github.com/apache/spark/pull/36303#discussion_r855641933 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SchemaPruning.scala: ## @@ -152,6 +152,10 @@ object SchemaPruning extends SQLConfHelper {

[GitHub] [spark] sadikovi commented on pull request #36158: [SPARK-38829][SQL] Add a configuration flag to enable TIMESTAMP_NTZ support in Parquet data source

2022-04-21 Thread GitBox
sadikovi commented on PR #36158: URL: https://github.com/apache/spark/pull/36158#issuecomment-1105825764 @gengliangwang @sunchao thanks for the review! I addressed the comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] sadikovi commented on a diff in pull request #36158: [SPARK-38829][SQL] Add a configuration flag to enable TIMESTAMP_NTZ support in Parquet data source

2022-04-21 Thread GitBox
sadikovi commented on code in PR #36158: URL: https://github.com/apache/spark/pull/36158#discussion_r855640236 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1061,6 +1061,16 @@ object SQLConf { .booleanConf .createWithDefault(fal

[GitHub] [spark] sadikovi commented on a diff in pull request #36158: [SPARK-38829][SQL] Add a configuration flag to enable TIMESTAMP_NTZ support in Parquet data source

2022-04-21 Thread GitBox
sadikovi commented on code in PR #36158: URL: https://github.com/apache/spark/pull/36158#discussion_r855639984 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1061,6 +1061,16 @@ object SQLConf { .booleanConf .createWithDefault(fal

[GitHub] [spark] sadikovi commented on a diff in pull request #36158: [SPARK-38829][SQL] Add a configuration flag to enable TIMESTAMP_NTZ support in Parquet data source

2022-04-21 Thread GitBox
sadikovi commented on code in PR #36158: URL: https://github.com/apache/spark/pull/36158#discussion_r855639984 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -1061,6 +1061,16 @@ object SQLConf { .booleanConf .createWithDefault(fal

[GitHub] [spark] huaxingao commented on a diff in pull request #36303: [SPARK-38977][SQL] Fix schema pruning with correlated subqueries

2022-04-21 Thread GitBox
huaxingao commented on code in PR #36303: URL: https://github.com/apache/spark/pull/36303#discussion_r855637167 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala: ## @@ -935,4 +935,106 @@ abstract class SchemaPruningSuite .coun

[GitHub] [spark] huaxingao commented on a diff in pull request #36303: [SPARK-38977][SQL] Fix schema pruning with correlated subqueries

2022-04-21 Thread GitBox
huaxingao commented on code in PR #36303: URL: https://github.com/apache/spark/pull/36303#discussion_r855636865 ## sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala: ## @@ -935,4 +935,106 @@ abstract class SchemaPruningSuite .coun

  1   2   3   >