[GitHub] [spark] qiuliang988 commented on a diff in pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics

2022-05-09 Thread GitBox
qiuliang988 commented on code in PR #36412: URL: https://github.com/apache/spark/pull/36412#discussion_r868841346 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala: ## @@ -74,6 +74,32 @@ private[sql] class

[GitHub] [spark] Yikf closed pull request #35354: [WIP][SPARK-38053][SQL] Report driver side metric from datasource V2

2022-05-09 Thread GitBox
Yikf closed pull request #35354: [WIP][SPARK-38053][SQL] Report driver side metric from datasource V2 URL: https://github.com/apache/spark/pull/35354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] manuzhang commented on pull request #29516: [SPARK-32614][SQL] Don't apply comment processing if 'comment' unset for CSV

2022-05-09 Thread GitBox
manuzhang commented on PR #29516: URL: https://github.com/apache/spark/pull/29516#issuecomment-1121933048 I created https://github.com/uniVocity/univocity-parsers/issues/505 to request optionally disabling quoting row-starting comment char. -- This is an automated message from the Apache

[GitHub] [spark] huaxingao commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-09 Thread GitBox
huaxingao commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r868803520 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -41,6 +41,11 @@ private object H2Dialect extends JdbcDialect { case _ =>

[GitHub] [spark] beliefer opened a new pull request, #36492: [SPARK-39135][SQL] DS V2 aggregate partial push-down should supports group by without aggregate functions

2022-05-09 Thread GitBox
beliefer opened a new pull request, #36492: URL: https://github.com/apache/spark/pull/36492 ### What changes were proposed in this pull request? Currently, the SQL show below not supported by DS V2 aggregate partial push-down. `select key from tab group by key` ### Why are

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r868773358 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -41,6 +41,11 @@ private object H2Dialect extends JdbcDialect { case _ =>

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868772636 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING PARQUET

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868772038 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING PARQUET

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868771935 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING PARQUET

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868771621 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING PARQUET

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868771361 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,22 +56,42 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868770083 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -43,6 +44,9 @@ import org.apache.spark.unsafe.types.UTF8String *

[GitHub] [spark] srowen closed pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
srowen closed pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore URL: https://github.com/apache/spark/pull/36489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
srowen commented on PR #36489: URL: https://github.com/apache/spark/pull/36489#issuecomment-1121830172 Merged to 3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] srowen commented on a diff in pull request #36485: [SPARK-39128][SQL][HIVE] Log cost time for getting FileStatus in HadoopTableReader

2022-05-09 Thread GitBox
srowen commented on code in PR #36485: URL: https://github.com/apache/spark/pull/36485#discussion_r868761212 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala: ## @@ -175,7 +175,8 @@ class HadoopTableReader( def

[GitHub] [spark] HyukjinKwon closed pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
HyukjinKwon closed pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series URL: https://github.com/apache/spark/pull/36414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
HyukjinKwon commented on PR #36414: URL: https://github.com/apache/spark/pull/36414#issuecomment-1121814422 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] viirya opened a new pull request, #36491: [SPARK-XXXXX][SS] Add custom metric of skipped null values for stream join operator

2022-05-09 Thread GitBox
viirya opened a new pull request, #36491: URL: https://github.com/apache/spark/pull/36491 ### What changes were proposed in this pull request? This proposes to add a custom metric of skipped null values for stream-stream join operator. ### Why are the changes

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868756052 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868755471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868755471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868755295 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING

[GitHub] [spark] pan3793 commented on a diff in pull request #36485: [SPARK-39128][SQL][HIVE] Log cost time for getting FileStatus in HadoopTableReader

2022-05-09 Thread GitBox
pan3793 commented on code in PR #36485: URL: https://github.com/apache/spark/pull/36485#discussion_r868751869 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala: ## @@ -175,7 +175,8 @@ class HadoopTableReader( def

[GitHub] [spark] beliefer commented on pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
beliefer commented on PR #36405: URL: https://github.com/apache/spark/pull/36405#issuecomment-1121782479 > Please elaborate on the idea more. I don't get how to avoid OOM by not pushing down the limit. The tasks on executors will pull all the result set from data source anyway, isn't it?

[GitHub] [spark] srowen commented on pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
srowen commented on PR #36489: URL: https://github.com/apache/spark/pull/36489#issuecomment-1121775694 Meh, it's just .gitignore entries. If it's at all relevant for 3.3, seems OK -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
HyukjinKwon commented on PR #36489: URL: https://github.com/apache/spark/pull/36489#issuecomment-1121773801 Guess this is fine. I am okay. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-09 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r868717854 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -41,6 +41,11 @@ private object H2Dialect extends JdbcDialect { case _ =>

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36490: [SPARK-39133][PYTHON][DOC] Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED

2022-05-09 Thread GitBox
HyukjinKwon commented on code in PR #36490: URL: https://github.com/apache/spark/pull/36490#discussion_r868717268 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2577,8 +2577,9 @@ object SQLConf { val PYSPARK_JVM_STACKTRACE_ENABLED =

[GitHub] [spark] beliefer commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
beliefer commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r868715276 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala: ## @@ -323,6 +323,16 @@ class RegexpExpressionsSuite extends

[GitHub] [spark] dcoliversun commented on pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns

2022-05-09 Thread GitBox
dcoliversun commented on PR #36487: URL: https://github.com/apache/spark/pull/36487#issuecomment-1121737252 Thank you @cloud-fan @jackierwzhang @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] sadikovi commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

2022-05-09 Thread GitBox
sadikovi commented on PR #36427: URL: https://github.com/apache/spark/pull/36427#issuecomment-1121735595 I fixed the tests, everything should be fine now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
srowen commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1121734508 Merged to master/3.3/3.2/3.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen closed pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
srowen closed pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace URL: https://github.com/apache/spark/pull/36457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #36467: [SPARK-39063][CORE] Fix potential risk of JVM crash and `RocksDBIterator` resource leak

2022-05-09 Thread GitBox
LuciferYang commented on PR #36467: URL: https://github.com/apache/spark/pull/36467#issuecomment-1121718336 > I have not looked into this in great detail, but why not simply return a boolean from `notifyIteratorClosed` to indicate if the underlying db has been closed or not ? This can be

[GitHub] [spark] dongjoon-hyun commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
dongjoon-hyun commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-112171 BTW, cc @MaxGekk for Apache Spark 3.3.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
dongjoon-hyun commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1121715011 I'm also +1 for fixing this in all applicable branches, @srowen . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
srowen commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1121713316 Question for, maybe, @dongjoon-hyun -- this is basically a bug fix for a behavior change in 3.1.0. I'd merge this to master and 3.3, but what about 3.1.x and 3.2.x? because fixing this

[GitHub] [spark] github-actions[bot] commented on pull request #35354: [WIP][SPARK-38053][SQL] Report driver side metric from datasource V2

2022-05-09 Thread GitBox
github-actions[bot] commented on PR #35354: URL: https://github.com/apache/spark/pull/35354#issuecomment-1121707778 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36452: [SPARK-39109][PYTHON] Adjust `GroupBy.mean/median` to match pandas 1.4

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36452: URL: https://github.com/apache/spark/pull/36452#discussion_r868581909 ## python/pyspark/pandas/groupby.py: ## @@ -2673,7 +2682,7 @@ def get_group(self, name: Union[Name, List[Name]]) -> FrameLike: return

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36452: [SPARK-39109][PYTHON] Adjust `GroupBy.mean/median` to match pandas 1.4

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36452: URL: https://github.com/apache/spark/pull/36452#discussion_r868580132 ## python/pyspark/pandas/tests/test_groupby.py: ## @@ -35,6 +35,19 @@ class GroupByTest(PandasOnSparkTestCase, TestUtils): +def pdf(self): Review

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36452: [SPARK-39109][PYTHON] Adjust `GroupBy.mean/median` to match pandas 1.4

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36452: URL: https://github.com/apache/spark/pull/36452#discussion_r867029180 ## python/pyspark/pandas/groupby.py: ## @@ -2673,7 +2682,7 @@ def get_group(self, name: Union[Name, List[Name]]) -> FrameLike: return

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36490: [SPARK-39133][PYTHON][DOC] Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36490: URL: https://github.com/apache/spark/pull/36490#discussion_r868540952 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2577,8 +2577,9 @@ object SQLConf { val PYSPARK_JVM_STACKTRACE_ENABLED =

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36414: URL: https://github.com/apache/spark/pull/36414#discussion_r868525506 ## python/pyspark/pandas/series.py: ## @@ -6859,13 +6860,16 @@ def _reduce_for_stat_function( sfun : the stats function to be used for aggregation

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36414: URL: https://github.com/apache/spark/pull/36414#discussion_r868524131 ## python/pyspark/pandas/generic.py: ## @@ -1603,6 +1664,10 @@ def max( -- axis : {index (0), columns (1)} Axis for

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36490: [SPARK-39133][PYTHON][DOC] Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED

2022-05-09 Thread GitBox
HyukjinKwon commented on code in PR #36490: URL: https://github.com/apache/spark/pull/36490#discussion_r868518904 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2577,8 +2577,9 @@ object SQLConf { val PYSPARK_JVM_STACKTRACE_ENABLED =

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36490: [SPARK-39133][PYTHON][DOC] Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED

2022-05-09 Thread GitBox
HyukjinKwon commented on code in PR #36490: URL: https://github.com/apache/spark/pull/36490#discussion_r868519354 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2577,8 +2577,9 @@ object SQLConf { val PYSPARK_JVM_STACKTRACE_ENABLED =

[GitHub] [spark] allisonwang-db commented on pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-09 Thread GitBox
allisonwang-db commented on PR #36386: URL: https://github.com/apache/spark/pull/36386#issuecomment-1121631112 Hmm let me try again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36414: URL: https://github.com/apache/spark/pull/36414#discussion_r868515930 ## python/pyspark/pandas/series.py: ## @@ -6859,13 +6860,16 @@ def _reduce_for_stat_function( sfun : the stats function to be used for aggregation

[GitHub] [spark] mridulm commented on pull request #36467: [SPARK-39063][CORE] Fix potential risk of JVM crash and `RocksDBIterator` resource leak

2022-05-09 Thread GitBox
mridulm commented on PR #36467: URL: https://github.com/apache/spark/pull/36467#issuecomment-1121625153 I have not looked into this in great detail, but why not simply return a boolean from `notifyIteratorClosed` to indicate if the underlying db has been closed or not ? This can be used

[GitHub] [spark] dongjoon-hyun commented on pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
dongjoon-hyun commented on PR #36489: URL: https://github.com/apache/spark/pull/36489#issuecomment-1121612923 Actually, this is a backport requested by @pan3793 - https://github.com/apache/spark/pull/35838#issuecomment-1120111003 Although I'm still neutral for this backporting

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36414: [SPARK-39077][PYTHON] Implement `skipna` of common statistical functions of DataFrame and Series

2022-05-09 Thread GitBox
xinrong-databricks commented on code in PR #36414: URL: https://github.com/apache/spark/pull/36414#discussion_r868482037 ## python/pyspark/pandas/generic.py: ## @@ -1312,11 +1326,20 @@ def sum(psser: "Series") -> Column: return F.coalesce(F.sum(spark_column),

[GitHub] [spark] viirya commented on pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-09 Thread GitBox
viirya commented on PR #36386: URL: https://github.com/apache/spark/pull/36386#issuecomment-1121571115 CI still doesn't pass: ``` check simplified (tpcds-v1.4/q4) *** FAILED *** ``` -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] allisonwang-db commented on a diff in pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-09 Thread GitBox
allisonwang-db commented on code in PR #36386: URL: https://github.com/apache/spark/pull/36386#discussion_r868433772 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q4/explain.txt: ## @@ -362,267 +362,267 @@ Output [11]: [customer_id#25, year_total#26,

[GitHub] [spark] xinrong-databricks opened a new pull request, #36490: [SPARK-39133][PYTHON][DOC] Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED

2022-05-09 Thread GitBox
xinrong-databricks opened a new pull request, #36490: URL: https://github.com/apache/spark/pull/36490 ### What changes were proposed in this pull request? Mention log level setting in PYSPARK_JVM_STACKTRACE_ENABLED. ### Why are the changes needed? Even if

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
dtenedor commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868397971 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
dtenedor commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868397446 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] dtenedor commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
dtenedor commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868396911 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] sunchao commented on pull request #36427: [SPARK-39086][SQL] Support UDT in Spark Parquet vectorized reader

2022-05-09 Thread GitBox
sunchao commented on PR #36427: URL: https://github.com/apache/spark/pull/36427#issuecomment-1121492836 @sadikovi hmm it seems there are still some test failures -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868283084 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868282107 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36365: [SPARK-28516][SQL] Implement `to_char` and `try_to_char` functions to format Decimal values as strings

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36365: URL: https://github.com/apache/spark/pull/36365#discussion_r868281343 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/numberFormatExpressions.scala: ## @@ -168,3 +168,157 @@ case class TryToNumber(left:

[GitHub] [spark] viirya commented on a diff in pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-09 Thread GitBox
viirya commented on code in PR #36386: URL: https://github.com/apache/spark/pull/36386#discussion_r868275450 ## sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q4/explain.txt: ## @@ -362,267 +362,267 @@ Output [11]: [customer_id#25, year_total#26,

[GitHub] [spark] LuciferYang commented on pull request #36484: [SPARK-39127][CORE] Fix potential risk of `LevelDBIterator` leak

2022-05-09 Thread GitBox
LuciferYang commented on PR #36484: URL: https://github.com/apache/spark/pull/36484#issuecomment-1121402089 Now I just infer that similar issue also exist in leveldb because it is very similar to rocksdb, but I'm really not sure because I haven't been able to manually compile leveldb for

[GitHub] [spark] LuciferYang commented on pull request #36467: [SPARK-39063][CORE] Fix potential risk of JVM crash and `RocksDBIterator` resource leak

2022-05-09 Thread GitBox
LuciferYang commented on PR #36467: URL: https://github.com/apache/spark/pull/36467#issuecomment-1121389575 DEBUG_LEVEL will manually set to 0 to disable all assertions when normal rocksdb release, so it won't really crash the VM, but I think the issue still exists -- This is an

[GitHub] [spark] cloud-fan commented on a diff in pull request #36438: [SPARK-39092][SQL] Propagate Empty Partitions

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36438: URL: https://github.com/apache/spark/pull/36438#discussion_r868241905 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PropagateEmptyPartitions.scala: ## @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #36412: [SPARK-39073][SQL] Keep rowCount after hive table partition pruning if table only have hive statistics

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36412: URL: https://github.com/apache/spark/pull/36412#discussion_r868230134 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/PruneHiveTablePartitions.scala: ## @@ -74,6 +74,32 @@ private[sql] class

[GitHub] [spark] vli-databricks commented on pull request #36458: [SPARK-39060][SQL][3.2] Typo in error messages of decimal overflow

2022-05-09 Thread GitBox
vli-databricks commented on PR #36458: URL: https://github.com/apache/spark/pull/36458#issuecomment-1121352437 @dongjoon-hyun yes, I was digging into that but the output plans extracted from logs are exactly the same (there is an extra new line in `approved` but it seems to be ignored). My

[GitHub] [spark] huaxingao commented on pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns

2022-05-09 Thread GitBox
huaxingao commented on PR #36487: URL: https://github.com/apache/spark/pull/36487#issuecomment-1121350747 Thanks! Merged to master and 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] huaxingao closed pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns

2022-05-09 Thread GitBox
huaxingao closed pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns URL: https://github.com/apache/spark/pull/36487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868225414 ## sql/core/src/test/resources/sql-tests/inputs/udf/postgreSQL/udf-case.sql: ## @@ -67,14 +67,12 @@ SELECT '7' AS `None`, CASE WHEN rand() < udf(0) THEN 1 END

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868225414 ## sql/core/src/test/resources/sql-tests/inputs/udf/postgreSQL/udf-case.sql: ## @@ -67,14 +67,12 @@ SELECT '7' AS `None`, CASE WHEN rand() < udf(0) THEN 1 END

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868223880 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] srowen commented on pull request #36467: [SPARK-39063][CORE] Fix potential risk of JVM crash and `RocksDBIterator` resource leak

2022-05-09 Thread GitBox
srowen commented on PR #36467: URL: https://github.com/apache/spark/pull/36467#issuecomment-1121345960 This seems like a well thought-out change, though it's complex compared to the issue it's trying to solve - a setting of a non-default third party library. I'm neutral about bothering

[GitHub] [spark] srowen commented on a diff in pull request #36485: [SPARK-39128][SQL][HIVE] Log cost time for getting FileStatus in HadoopTableReader

2022-05-09 Thread GitBox
srowen commented on code in PR #36485: URL: https://github.com/apache/spark/pull/36485#discussion_r868219599 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala: ## @@ -175,7 +175,8 @@ class HadoopTableReader( def

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868219001 ## sql/core/src/test/resources/sql-tests/inputs/ansi/conditional-functions.sql: ## @@ -1,6 +1,41 @@ -- Tests for conditional functions -CREATE TABLE t USING PARQUET

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868218630 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] srowen commented on pull request #36489: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
srowen commented on PR #36489: URL: https://github.com/apache/spark/pull/36489#issuecomment-1121341131 Is this not necessary on the master branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868212836 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r868204386 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,23 +53,46 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r868202310 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -41,6 +41,11 @@ private object H2Dialect extends JdbcDialect { case _ =>

[GitHub] [spark] cloud-fan commented on pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
cloud-fan commented on PR #36405: URL: https://github.com/apache/spark/pull/36405#issuecomment-1121320934 Please elaborate on the idea more. I don't get how to avoid OOM by not pushing down the limit. The tasks on executors will pull all the result set from data source anyway, isn't it?

[GitHub] [spark] srowen commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
srowen commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1121319884 All tests show that they pass. I'll wait a beat for last comments, but should be OK. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] huaxingao commented on a diff in pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
huaxingao commented on code in PR #36405: URL: https://github.com/apache/spark/pull/36405#discussion_r868169150 ## docs/sql-data-sources-jdbc.md: ## @@ -286,6 +286,15 @@ logging into the data sources. read + +maxPushDownLimit Review Comment: I feel it's a

[GitHub] [spark] LorenzoMartini commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
LorenzoMartini commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1121260349 Github UI shows failing but nothing is actually failing here. All comments should be addressed, can we get a merge on this please? -- This is an automated message from the

[GitHub] [spark] yutoacts commented on pull request #35838: [MINOR][INFRA] Add ANTLR generated files to .gitignore

2022-05-09 Thread GitBox
yutoacts commented on PR #35838: URL: https://github.com/apache/spark/pull/35838#issuecomment-1121196678 @pan3793 Thanks for letting me know! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yutoacts opened a new pull request, #36489: Minor gitignore 3.3

2022-05-09 Thread GitBox
yutoacts opened a new pull request, #36489: URL: https://github.com/apache/spark/pull/36489 ### What changes were proposed in this pull request? Add git ignore entries for files created by ANTLR. This is a backport of #35838. ### Why are the changes needed? To

[GitHub] [spark] AmplabJenkins commented on pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Replace named parameter with comment in ReplaceColumns

2022-05-09 Thread GitBox
AmplabJenkins commented on PR #36487: URL: https://github.com/apache/spark/pull/36487#issuecomment-1121154516 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] ulysses-you commented on pull request #36488: [SPARK-39112][SQL] UnsupportedOperationException if spark.sql.ui.explainMode is set to cost

2022-05-09 Thread GitBox
ulysses-you commented on PR #36488: URL: https://github.com/apache/spark/pull/36488#issuecomment-1121036491 cc @cloud-fan @HyukjinKwon @wangyum @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-09 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r867943111 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala: ## @@ -323,6 +323,16 @@ class RegexpExpressionsSuite

[GitHub] [spark] ulysses-you opened a new pull request, #36488: [SPARK-39112][SQL] UnsupportedOperationException if spark.sql.ui.explainMode is set to cost

2022-05-09 Thread GitBox
ulysses-you opened a new pull request, #36488: URL: https://github.com/apache/spark/pull/36488 ### What changes were proposed in this pull request? Add a new leaf like node `ResolvedLeafObject` and apply to the list: - ResolvedDBObjectName - ResolvedNamespace -

[GitHub] [spark] ravwojdyla commented on a diff in pull request #36430: [WIP][SPARK-38904] Select by schema

2022-05-09 Thread GitBox
ravwojdyla commented on code in PR #36430: URL: https://github.com/apache/spark/pull/36430#discussion_r867931583 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1593,6 +1593,35 @@ class Dataset[T] private[sql]( @scala.annotation.varargs def

[GitHub] [spark] EnricoMi commented on pull request #35965: [SPARK-38647][SQL] Add SupportsReportOrdering mix in interface for Scan (DataSourceV2)

2022-05-09 Thread GitBox
EnricoMi commented on PR #35965: URL: https://github.com/apache/spark/pull/35965#issuecomment-1120977044 @sunchao I have pushed all my changes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dcoliversun commented on a diff in pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Remove undefined variable 'ifExists'

2022-05-09 Thread GitBox
dcoliversun commented on code in PR #36487: URL: https://github.com/apache/spark/pull/36487#discussion_r867824481 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -144,7 +144,7 @@ case class ReplaceColumns(

[GitHub] [spark] beliefer commented on a diff in pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
beliefer commented on code in PR #36405: URL: https://github.com/apache/spark/pull/36405#discussion_r867816015 ## docs/sql-data-sources-jdbc.md: ## @@ -286,6 +286,15 @@ logging into the data sources. read + +maxPushDownLimit Review Comment: OK -- This

[GitHub] [spark] cloud-fan commented on a diff in pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36405: URL: https://github.com/apache/spark/pull/36405#discussion_r867813879 ## docs/sql-data-sources-jdbc.md: ## @@ -286,6 +286,15 @@ logging into the data sources. read + +maxPushDownLimit Review Comment: BTW we should

[GitHub] [spark] cloud-fan commented on a diff in pull request #36405: [SPARK-39065][SQL] DS V2 Limit push-down should avoid out of memory

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36405: URL: https://github.com/apache/spark/pull/36405#discussion_r867811857 ## docs/sql-data-sources-jdbc.md: ## @@ -286,6 +286,15 @@ logging into the data sources. read + +maxPushDownLimit Review Comment: Shall we make

[GitHub] [spark] cloud-fan commented on a diff in pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Remove undefined variable 'ifExists'

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36487: URL: https://github.com/apache/spark/pull/36487#discussion_r867808244 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -144,7 +144,7 @@ case class ReplaceColumns(

[GitHub] [spark] dcoliversun commented on a diff in pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Remove undefined variable 'ifExists'

2022-05-09 Thread GitBox
dcoliversun commented on code in PR #36487: URL: https://github.com/apache/spark/pull/36487#discussion_r867793324 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -144,7 +144,7 @@ case class ReplaceColumns(

[GitHub] [spark] dcoliversun commented on a diff in pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Remove undefined variable 'ifExists'

2022-05-09 Thread GitBox
dcoliversun commented on code in PR #36487: URL: https://github.com/apache/spark/pull/36487#discussion_r867793324 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -144,7 +144,7 @@ case class ReplaceColumns(

[GitHub] [spark] cloud-fan commented on a diff in pull request #36487: [SPARK-38939][SQL][FOLLOWUP] Remove undefined variable 'ifExists'

2022-05-09 Thread GitBox
cloud-fan commented on code in PR #36487: URL: https://github.com/apache/spark/pull/36487#discussion_r867787493 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala: ## @@ -144,7 +144,7 @@ case class ReplaceColumns(

  1   2   >