[GitHub] [spark] cloud-fan commented on a diff in pull request #36530: [SPARK-39172][SQL] Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36530: URL: https://github.com/apache/spark/pull/36530#discussion_r873346931 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -211,6 +219,15 @@ object EliminateOuterJoin extends Rule[LogicalPlan] with

[GitHub] [spark] cloud-fan commented on a diff in pull request #36530: [SPARK-39172][SQL] Remove outer join if all output come from streamed side and buffered side keys exist unique key

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36530: URL: https://github.com/apache/spark/pull/36530#discussion_r873344595 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala: ## @@ -139,6 +139,14 @@ object ReorderJoin extends Rule[LogicalPlan] with

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r873341127 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #36295: [SPARK-38978][SQL] Support push down OFFSET to JDBC data source V2

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36295: URL: https://github.com/apache/spark/pull/36295#discussion_r873340929 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownOffset.java: ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] MaxGekk commented on pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer

2022-05-15 Thread GitBox
MaxGekk commented on PR #36479: URL: https://github.com/apache/spark/pull/36479#issuecomment-1127239102 @panbingkun Since this PR modified error classes, could you backport it to branch-3.3, please. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] MaxGekk closed pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer

2022-05-15 Thread GitBox
MaxGekk closed pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer URL: https://github.com/apache/spark/pull/36479 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan closed pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics

2022-05-15 Thread GitBox
cloud-fan closed pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics URL: https://github.com/apache/spark/pull/36412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics

2022-05-15 Thread GitBox
cloud-fan commented on PR #36412: URL: https://github.com/apache/spark/pull/36412#issuecomment-1127235625 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36412: URL: https://github.com/apache/spark/pull/36412#discussion_r87309 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala: ## @@ -80,10 +80,15 @@ private[sql] class

[GitHub] [spark] MaxGekk closed pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`

2022-05-15 Thread GitBox
MaxGekk closed pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException` URL: https://github.com/apache/spark/pull/36550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan closed pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet

2022-05-15 Thread GitBox
cloud-fan closed pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet URL: https://github.com/apache/spark/pull/36121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] MaxGekk commented on pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`

2022-05-15 Thread GitBox
MaxGekk commented on PR #36550: URL: https://github.com/apache/spark/pull/36550#issuecomment-1127234215 Merging to master. Thank you, @HyukjinKwon and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #36121: [SPARK-38836][SQL] Improve the performance of ExpressionSet

2022-05-15 Thread GitBox
cloud-fan commented on PR #36121: URL: https://github.com/apache/spark/pull/36121#issuecomment-1127234077 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AnywalkerGiser commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
AnywalkerGiser commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127233836 @HyukjinKwon It hasn't been tested in master, I found the problem in 3.0.1, and I can test it in master later. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #36541: [SPARK-39180][SQL] Simplify the planning of limit and offset

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36541: URL: https://github.com/apache/spark/pull/36541#discussion_r873317698 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -82,52 +82,45 @@ abstract class SparkStrategies extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #36541: [SPARK-39180][SQL] Simplify the planning of limit and offset

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36541: URL: https://github.com/apache/spark/pull/36541#discussion_r873317698 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -82,52 +82,45 @@ abstract class SparkStrategies extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #36531: [SPARK-39171][SQL] Unify the Cast expression

2022-05-15 Thread GitBox
cloud-fan commented on code in PR #36531: URL: https://github.com/apache/spark/pull/36531#discussion_r873314783 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -2117,7 +2265,9 @@ case class Cast( child: Expression, dataType:

[GitHub] [spark] gengliangwang commented on a diff in pull request #36557: [SPARK-39190][SQL] Provide query context for decimal precision overflow error when WSCG is off

2022-05-15 Thread GitBox
gengliangwang commented on code in PR #36557: URL: https://github.com/apache/spark/pull/36557#discussion_r873307369 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -128,7 +128,7 @@ case class PromotePrecision(child:

[GitHub] [spark] gengliangwang opened a new pull request, #36557: [SPARK-39190][SQL] Provide query context for decimal precision overflow error when WSCG is off

2022-05-15 Thread GitBox
gengliangwang opened a new pull request, #36557: URL: https://github.com/apache/spark/pull/36557 ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides query context for decimal precision overflow error

[GitHub] [spark] AnywalkerGiser commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
AnywalkerGiser commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873305033 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -

[GitHub] [spark] AngersZhuuuu commented on pull request #36056: [SPARK-36571][SQL] Add an SQLOverwriteHadoopMapReduceCommitProtocol to support all SQL overwrite write data to staging dir

2022-05-15 Thread GitBox
AngersZh commented on PR #36056: URL: https://github.com/apache/spark/pull/36056#issuecomment-1127179471 Gentle ping @cloud-fan Could you take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] AngersZhuuuu commented on pull request #35799: [SPARK-38498][STREAM] Support customized StreamingListener by configuration

2022-05-15 Thread GitBox
AngersZh commented on PR #35799: URL: https://github.com/apache/spark/pull/35799#issuecomment-1127178691 Any more suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
HyukjinKwon commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127177497 @AnywalkerGiser mind creating a PR against `master` branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873298180 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297988 ## python/pyspark/tests/test_rdd.py: ## @@ -669,6 +670,12 @@ def test_sample(self): wr_s21 = rdd.sample(True, 0.4, 21).collect()

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297660 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
HyukjinKwon commented on code in PR #36537: URL: https://github.com/apache/spark/pull/36537#discussion_r873297554 ## python/pyspark/sql/types.py: ## @@ -191,14 +191,25 @@ def needConversion(self): def toInternal(self, dt): if dt is not None: -seconds

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36550: [SPARK-39187][SQL] Remove `SparkIllegalStateException`

2022-05-15 Thread GitBox
AngersZh commented on code in PR #36550: URL: https://github.com/apache/spark/pull/36550#discussion_r873294811 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -582,8 +582,8 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] beliefer opened a new pull request, #36556: [SPARK-39162][SQL][3.3] Jdbc dialect should decide which function could be pushed down

2022-05-15 Thread GitBox
beliefer opened a new pull request, #36556: URL: https://github.com/apache/spark/pull/36556 ### What changes were proposed in this pull request? This PR used to back port https://github.com/apache/spark/pull/36521 to 3.3 ### Why are the changes needed? Let function push-down

[GitHub] [spark] AnywalkerGiser commented on pull request #36537: [SPARK-39176][PYSPARK][WINDOWS] Fixed a problem with pyspark serializing pre-1970 datetime in windows

2022-05-15 Thread GitBox
AnywalkerGiser commented on PR #36537: URL: https://github.com/apache/spark/pull/36537#issuecomment-1127149821 Is there a supervisor for approval? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beliefer commented on pull request #36521: [SPARK-39162][SQL] Jdbc dialect should decide which function could be pushed down

2022-05-15 Thread GitBox
beliefer commented on PR #36521: URL: https://github.com/apache/spark/pull/36521#issuecomment-1127146479 @cloud-fan @huaxingao Thank you a lot! I will create back port to 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] beliefer closed pull request #36520: [SPARK-38633][SQL] Support push down AnsiCast to JDBC data source V2

2022-05-15 Thread GitBox
beliefer closed pull request #36520: [SPARK-38633][SQL] Support push down AnsiCast to JDBC data source V2 URL: https://github.com/apache/spark/pull/36520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a diff in pull request #36531: [SPARK-39171][SQL] Unify the Cast expression

2022-05-15 Thread GitBox
beliefer commented on code in PR #36531: URL: https://github.com/apache/spark/pull/36531#discussion_r873277888 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -275,6 +376,53 @@ object Cast { case _ => null } } + + //

[GitHub] [spark] LuciferYang commented on pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.

2022-05-15 Thread GitBox
LuciferYang commented on PR #36515: URL: https://github.com/apache/spark/pull/36515#issuecomment-1127140077 thanks @huaxingao @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #36555: [SPARK-39189][PYTHON] Support limit_area parameter in pandas API on Spark

2022-05-15 Thread GitBox
zhengruifeng commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1127136933 @HyukjinKwon Sure! will update soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] beobest2 commented on pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list

2022-05-15 Thread GitBox
beobest2 commented on PR #36509: URL: https://github.com/apache/spark/pull/36509#issuecomment-1127127677 @bjornjorgensen Seems like a good idea! I can simply add a column to display parameters that only exist in pandas. However, it is necessary to discuss whether or not it meets the

[GitHub] [spark] HyukjinKwon commented on pull request #36555: [SPARK-39189][PYTHON] Support limit_area parameter in pandas API on Spark

2022-05-15 Thread GitBox
HyukjinKwon commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1127098019 @zhengruifeng mind showing the example of this argument usage in the PR description? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon closed pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness

2022-05-15 Thread GitBox
HyukjinKwon closed pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness URL: https://github.com/apache/spark/pull/36554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #36554: [SPARK-39186][PYTHON][FOLLOWUP] Improve the numerical stability of pandas-on-Spark's skewness

2022-05-15 Thread GitBox
HyukjinKwon commented on PR #36554: URL: https://github.com/apache/spark/pull/36554#issuecomment-1127097656 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] github-actions[bot] closed pull request #35357: [SPARK-21195][CORE] MetricSystem should pick up dynamically registered metrics in sources

2022-05-15 Thread GitBox
github-actions[bot] closed pull request #35357: [SPARK-21195][CORE] MetricSystem should pick up dynamically registered metrics in sources URL: https://github.com/apache/spark/pull/35357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] bjornjorgensen commented on pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list

2022-05-15 Thread GitBox
bjornjorgensen commented on PR #36509: URL: https://github.com/apache/spark/pull/36509#issuecomment-1127032945 Yes, very good. I was thinking, pandas API on spark has some more options then pandas have. Like to_json() have `ignoreNullFields=True` and `num_files=1` Can we add

[GitHub] [spark] tiagovrtr commented on pull request #33675: [SPARK-27997][K8S] Add support for kubernetes OAuth Token refresh

2022-05-15 Thread GitBox
tiagovrtr commented on PR #33675: URL: https://github.com/apache/spark/pull/33675#issuecomment-1126996196 this patch seems only to bring the latest changes from master, anything else to do here? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] mridulm commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local RDD blocks in case of IO related errors

2022-05-15 Thread GitBox
mridulm commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r873204359 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,10 +933,29 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] MaxGekk commented on a diff in pull request #36479: [SPARK-38688][SQL][TESTS] Use error classes in the compilation errors of deserializer

2022-05-15 Thread GitBox
MaxGekk commented on code in PR #36479: URL: https://github.com/apache/spark/pull/36479#discussion_r873201532 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -147,14 +147,17 @@ object QueryCompilationErrors extends QueryErrorsBase

[GitHub] [spark] huaxingao commented on pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.

2022-05-15 Thread GitBox
huaxingao commented on PR #36515: URL: https://github.com/apache/spark/pull/36515#issuecomment-1126965449 Thanks! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] huaxingao closed pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.

2022-05-15 Thread GitBox
huaxingao closed pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`. URL: https://github.com/apache/spark/pull/36515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng opened a new pull request, #36555: [SPARK-39189][PYTHON] interpolate supports limit_area

2022-05-15 Thread GitBox
zhengruifeng opened a new pull request, #36555: URL: https://github.com/apache/spark/pull/36555 ### What changes were proposed in this pull request? interpolate supports param `limit_area` ### Why are the changes needed? to increase api coverage ### Does this PR

[GitHub] [spark] LuciferYang commented on pull request #36515: [SPARK-39156][SQL] Clean up the usage of `ParquetLogRedirector` in `ParquetFileFormat`.

2022-05-15 Thread GitBox
LuciferYang commented on PR #36515: URL: https://github.com/apache/spark/pull/36515#issuecomment-1126870123 hmm... @sunchao any other need changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang commented on pull request #36078: [SPARK-38814][BUILD][TESTS] Migrate Junit 4 to Junit 5

2022-05-15 Thread GitBox
LuciferYang commented on PR #36078: URL: https://github.com/apache/spark/pull/36078#issuecomment-1126869252 > Yeah, I have the same thought w/ Sean's Got it ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2022-05-15 Thread GitBox
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r873115217 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,18 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2022-05-15 Thread GitBox
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r873115217 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,18 @@ public static byte[] bufferToArray(ByteBuffer buffer) {

[GitHub] [spark] LuciferYang commented on a diff in pull request #36529: [SPARK-39102][CORE][SQL][DSTREAM] Add checkstyle rules to disabled use of Guava's `Files.createTempDir()`

2022-05-15 Thread GitBox
LuciferYang commented on code in PR #36529: URL: https://github.com/apache/spark/pull/36529#discussion_r873115217 ## common/network-common/src/main/java/org/apache/spark/network/util/JavaUtils.java: ## @@ -362,6 +364,18 @@ public static byte[] bufferToArray(ByteBuffer buffer) {