[GitHub] [spark] MaxGekk commented on pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
MaxGekk commented on PR #36553: URL: https://github.com/apache/spark/pull/36553#issuecomment-1129594505 @gengliangwang Yep, I am about to backport this to 3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beobest2 commented on pull request #36509: [SPARK-38961][PYTHON][DOCS] Enhance to automatically generate the the pandas API support list

2022-05-17 Thread GitBox
beobest2 commented on PR #36509: URL: https://github.com/apache/spark/pull/36509#issuecomment-1129593287 @Yikun I've fixed whole `f-strings` which required be fixed in https://github.com/apache/spark/pull/36509#discussion_r874263076 Please check latest commit :) -- This is an

[GitHub] [spark] AmplabJenkins commented on pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
AmplabJenkins commented on PR #36586: URL: https://github.com/apache/spark/pull/36586#issuecomment-1129589934 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] gengliangwang commented on pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
gengliangwang commented on PR #36553: URL: https://github.com/apache/spark/pull/36553#issuecomment-1129589709 @MaxGekk shall we port this to 3.3 as well? I plan to put https://issues.apache.org/jira/browse/SPARK-39188 in 3.3 as well(either 3.3.0 or 3.3.1 release) -- This is an automated

[GitHub] [spark] linhongliu-db commented on pull request #36578: [SPARK-39207][SQL] Record the SQL text when executing a query using SparkSession.sql()

2022-05-17 Thread GitBox
linhongliu-db commented on PR #36578: URL: https://github.com/apache/spark/pull/36578#issuecomment-1129586303 > BTW, is it posibble for user to define the Description? @jackylee-ch, I think we can always set the description by using

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875480616 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala: ## @@ -1025,9 +1025,14 @@ abstract class CatalogTestUtils { def

[GitHub] [spark] wangyum opened a new pull request, #36588: [SPARK-39217][SQL] Makes DPP support the pruning side has Union

2022-05-17 Thread GitBox
wangyum opened a new pull request, #36588: URL: https://github.com/apache/spark/pull/36588 ### What changes were proposed in this pull request? Makes DPP support the pruning side has `Union`. For example: ```sql SELECT f.store_id, f.date_id, s.state_province

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875478230 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -204,8 +211,12 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] MaxGekk closed pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
MaxGekk closed pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST URL: https://github.com/apache/spark/pull/36553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875476033 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,15 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] MaxGekk commented on pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
MaxGekk commented on PR #36553: URL: https://github.com/apache/spark/pull/36553#issuecomment-1129576864 Merging to master. Thank you, @gengliangwang for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-05-17 Thread GitBox
Eugene-Mark commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1129569337 @srowen It should be a Teradata specific issue. I tried to read data with teradata driver, `terajdbc4` and `tdgssconfig` , the data read contains the fractional part. The code is sth

[GitHub] [spark] beliefer commented on pull request #34882: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_slope & regr_intercept

2022-05-17 Thread GitBox
beliefer commented on PR #34882: URL: https://github.com/apache/spark/pull/34882#issuecomment-1129555030 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-17 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r875431663 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -251,6 +251,11 @@ abstract class JdbcDialect extends Serializable with Logging{

[GitHub] [spark] LuciferYang commented on pull request #36571: [WIP][SPARK-39202][SQL] Introduce a `putByteArrays` method for `WritableColumnVector` to support setting multiple duplicate `byte[]`

2022-05-17 Thread GitBox
LuciferYang commented on PR #36571: URL: https://github.com/apache/spark/pull/36571#issuecomment-1129540524 Maybe it's better to use a dictionary to store `StringType` partition column. I'm testing it -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] wypoon commented on pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on PR #36585: URL: https://github.com/apache/spark/pull/36585#issuecomment-1129536896 I'm going to roll back the change where I add a space before "(line ``, position ``)" as there are many tests that would need to be updated otherwise. -- This is an automated message

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-17 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r875441003 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -251,6 +251,11 @@ abstract class JdbcDialect extends Serializable with Logging{

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-17 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r875431663 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -251,6 +251,11 @@ abstract class JdbcDialect extends Serializable with Logging{

[GitHub] [spark] beliefer commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-17 Thread GitBox
beliefer commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r875431114 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -251,6 +251,11 @@ abstract class JdbcDialect extends Serializable with Logging{

[GitHub] [spark] gengliangwang commented on a diff in pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
gengliangwang commented on code in PR #36553: URL: https://github.com/apache/spark/pull/36553#discussion_r875430655 ## core/src/main/resources/error/error-classes.json: ## @@ -22,8 +22,12 @@ "CANNOT_UP_CAST_DATATYPE" : { "message" : [ "Cannot up cast from to .\n" ]

[GitHub] [spark] attilapiros commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local RDD blocks in case of IO related errors

2022-05-17 Thread GitBox
attilapiros commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r875384244 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,10 +933,29 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] gengliangwang closed pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources

2022-05-17 Thread GitBox
gengliangwang closed pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources URL: https://github.com/apache/spark/pull/36562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] gengliangwang commented on pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources

2022-05-17 Thread GitBox
gengliangwang commented on PR #36562: URL: https://github.com/apache/spark/pull/36562#issuecomment-1129515141 Merging to master/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36587: [SPARK-39215][PYTHON] Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread GitBox
HyukjinKwon commented on code in PR #36587: URL: https://github.com/apache/spark/pull/36587#discussion_r875419684 ## python/pyspark/sql/utils.py: ## @@ -292,12 +292,4 @@ def is_timestamp_ntz_preferred() -> bool: """ Return a bool if TimestampNTZType is preferred

[GitHub] [spark] ueshin commented on a diff in pull request #36587: [SPARK-39215][PYTHON] Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread GitBox
ueshin commented on code in PR #36587: URL: https://github.com/apache/spark/pull/36587#discussion_r875392332 ## python/pyspark/sql/utils.py: ## @@ -292,12 +292,4 @@ def is_timestamp_ntz_preferred() -> bool: """ Return a bool if TimestampNTZType is preferred according

[GitHub] [spark] HyukjinKwon commented on pull request #36587: [SPARK-39215][PYTHON] Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread GitBox
HyukjinKwon commented on PR #36587: URL: https://github.com/apache/spark/pull/36587#issuecomment-1129468281 cc @ueshin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon opened a new pull request, #36587: [SPARK-39215][PYTHON] Reduce Py4J calls in pyspark.sql.utils.is_timestamp_ntz_preferred

2022-05-17 Thread GitBox
HyukjinKwon opened a new pull request, #36587: URL: https://github.com/apache/spark/pull/36587 ### What changes were proposed in this pull request? This PR proposes to reduce the number of Py4J calls at `pyspark.sql.utils.is_timestamp_ntz_preferred` by having a single method to

[GitHub] [spark] attilapiros commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
attilapiros commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875386684 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase {

[GitHub] [spark] attilapiros commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local RDD blocks in case of IO related errors

2022-05-17 Thread GitBox
attilapiros commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r875384244 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,10 +933,29 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875384532 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -204,8 +211,12 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] dcoliversun commented on pull request #36567: [SPARK-39196][CORE][SQL][K8S] replace `getOrElse(null)` with `orNull`

2022-05-17 Thread GitBox
dcoliversun commented on PR #36567: URL: https://github.com/apache/spark/pull/36567#issuecomment-1129463142 Thank for your help @srowen @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875384403 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,15 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] attilapiros commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local RDD blocks in case of IO related errors

2022-05-17 Thread GitBox
attilapiros commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r875384244 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,10 +933,29 @@ private[spark] class BlockManager( }) Some(new

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875383944 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala: ## @@ -38,12 +38,14 @@ case class ShowTablesExec( val rows = new

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875383670 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala: ## @@ -38,12 +38,14 @@ case class ShowTablesExec( val rows = new

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875383064 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala: ## @@ -1025,9 +1025,14 @@ abstract class CatalogTestUtils { def

[GitHub] [spark] wypoon commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875380471 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase { def

[GitHub] [spark] wypoon commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875380133 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -369,13 +369,11 @@ object QueryParsingErrors extends QueryErrorsBase {

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875377592 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -204,8 +211,12 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] zhengruifeng commented on pull request #36560: [SPARK-39192][PS][SQL] Make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread GitBox
zhengruifeng commented on PR #36560: URL: https://github.com/apache/spark/pull/36560#issuecomment-1129453466 Thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875378260 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala: ## @@ -38,12 +38,14 @@ case class ShowTablesExec( val rows = new

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875377592 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -204,8 +211,12 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r875376973 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -97,8 +97,15 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] amaliujia opened a new pull request, #36586: [DO NOT MERGE] test catalog API changes

2022-05-17 Thread GitBox
amaliujia opened a new pull request, #36586: URL: https://github.com/apache/spark/pull/36586 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] github-actions[bot] closed pull request #34359: [SPARK-36986][SQL] Improving schema filtering flexibility

2022-05-17 Thread GitBox
github-actions[bot] closed pull request #34359: [SPARK-36986][SQL] Improving schema filtering flexibility URL: https://github.com/apache/spark/pull/34359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] github-actions[bot] closed pull request #35342: [SPARK-38043][SQL] Refactor FileBasedDataSourceSuite and add DataSourceSuite for each data source

2022-05-17 Thread GitBox
github-actions[bot] closed pull request #35342: [SPARK-38043][SQL] Refactor FileBasedDataSourceSuite and add DataSourceSuite for each data source URL: https://github.com/apache/spark/pull/35342 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon closed pull request #36581: [SPARK-39054][PYTHON][PS] Ensure infer schema accuracy in GroupBy.apply

2022-05-17 Thread GitBox
HyukjinKwon closed pull request #36581: [SPARK-39054][PYTHON][PS] Ensure infer schema accuracy in GroupBy.apply URL: https://github.com/apache/spark/pull/36581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #36581: [SPARK-39054][PYTHON][PS] Ensure infer schema accuracy in GroupBy.apply

2022-05-17 Thread GitBox
HyukjinKwon commented on PR #36581: URL: https://github.com/apache/spark/pull/36581#issuecomment-1129428282 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
srowen commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875358379 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from empty

[GitHub] [spark] attilapiros commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
attilapiros commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875357337 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase {

[GitHub] [spark] HyukjinKwon closed pull request #36560: [SPARK-39192][PS][SQL] Make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread GitBox
HyukjinKwon closed pull request #36560: [SPARK-39192][PS][SQL] Make pandas-on-spark's kurt consistent with pandas URL: https://github.com/apache/spark/pull/36560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #36560: [SPARK-39192][PS][SQL] Make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread GitBox
HyukjinKwon commented on PR #36560: URL: https://github.com/apache/spark/pull/36560#issuecomment-1129421096 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #36501: [SPARK-39143][SQL] Support CSV scans with DEFAULT values

2022-05-17 Thread GitBox
HyukjinKwon closed pull request #36501: [SPARK-39143][SQL] Support CSV scans with DEFAULT values URL: https://github.com/apache/spark/pull/36501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #36501: [SPARK-39143][SQL] Support CSV scans with DEFAULT values

2022-05-17 Thread GitBox
HyukjinKwon commented on PR #36501: URL: https://github.com/apache/spark/pull/36501#issuecomment-1129420807 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
HyukjinKwon commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875352636 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from

[GitHub] [spark] srowen commented on pull request #36496: [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

2022-05-17 Thread GitBox
srowen commented on PR #36496: URL: https://github.com/apache/spark/pull/36496#issuecomment-1129416530 Merged to master/3.3/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srowen closed pull request #36496: [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

2022-05-17 Thread GitBox
srowen closed pull request #36496: [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe URL: https://github.com/apache/spark/pull/36496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] wypoon commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875347864 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase { def

[GitHub] [spark] wypoon commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875345718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase { def

[GitHub] [spark] wypoon commented on a diff in pull request #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon commented on code in PR #36585: URL: https://github.com/apache/spark/pull/36585#discussion_r875345718 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryParsingErrors.scala: ## @@ -392,10 +390,9 @@ object QueryParsingErrors extends QueryErrorsBase { def

[GitHub] [spark] wypoon opened a new pull request, #36585: [MINOR][SQL] Fix typos and formatting in query parsing error messages

2022-05-17 Thread GitBox
wypoon opened a new pull request, #36585: URL: https://github.com/apache/spark/pull/36585 ### What changes were proposed in this pull request? 1. Fix some typos in query parsing error messages. 2. Remove extraneous whitespace in some messages. Add a space before "(line , position )"

[GitHub] [spark] abellina commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

2022-05-17 Thread GitBox
abellina commented on PR #36505: URL: https://github.com/apache/spark/pull/36505#issuecomment-1129363308 For the max iterations issue, it looks to be happening during `NestedColumnAliasing`, where in the test columns that are extracting a child (`_extract_name`) keep generating new

[GitHub] [spark] dtenedor commented on pull request #36583: [SPARK-39211][SQL] Support JSON scans with DEFAULT values

2022-05-17 Thread GitBox
dtenedor commented on PR #36583: URL: https://github.com/apache/spark/pull/36583#issuecomment-1129351769 > Is this [[[SPARK-38067](https://issues.apache.org/jira/browse/SPARK-38067)][PYTHON] Preserve None values when saved to

[GitHub] [spark] bjornjorgensen commented on pull request #36583: [SPARK-39211][SQL] Support JSON scans with DEFAULT values

2022-05-17 Thread GitBox
bjornjorgensen commented on PR #36583: URL: https://github.com/apache/spark/pull/36583#issuecomment-1129337164 > Interesting note: JSON does not distinguish between NULL values and the absence of values. Therefore inserting NULL and then selecting back the same column yields the

[GitHub] [spark] vli-databricks opened a new pull request, #36584: [SPARK-39213] Create ANY_VALUE aggregate function

2022-05-17 Thread GitBox
vli-databricks opened a new pull request, #36584: URL: https://github.com/apache/spark/pull/36584 ### What changes were proposed in this pull request? Adding implementation for ANY_VALUE aggregate function. During optimization stage it is rewritten to `First` aggregate

[GitHub] [spark] zero323 commented on pull request #36547: [SPARK-39197][PYTHON] Implement `skipna` parameter of `GroupBy.all`

2022-05-17 Thread GitBox
zero323 commented on PR #36547: URL: https://github.com/apache/spark/pull/36547#issuecomment-1129228724 > Would you give an example in which case we may diverge from pandas? I Sure thing @xinrong-databricks. Sorry for being enigmatic before. So, very simple case would be something

[GitHub] [spark] MaxGekk commented on a diff in pull request #36561: [SPARK-37939][SQL] Use error classes in the parsing errors of properties

2022-05-17 Thread GitBox
MaxGekk commented on code in PR #36561: URL: https://github.com/apache/spark/pull/36561#discussion_r875157818 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryParsingErrorsSuite.scala: ## @@ -642,4 +642,92 @@ class QueryParsingErrorsSuite extends QueryTest with

[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
srowen commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875147010 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from empty

[GitHub] [spark] physinet commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
physinet commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875142772 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36560: [SPARK-39192][PS][SQL] Make pandas-on-spark's kurt consistent with pandas

2022-05-17 Thread GitBox
xinrong-databricks commented on code in PR #36560: URL: https://github.com/apache/spark/pull/36560#discussion_r875138420 ## python/pyspark/pandas/tests/test_generic_functions.py: ## @@ -150,8 +150,8 @@ def test_stat_functions(self):

[GitHub] [spark] xinrong-databricks commented on pull request #36547: [SPARK-39197][PYTHON] Implement `skipna` parameter of `GroupBy.all`

2022-05-17 Thread GitBox
xinrong-databricks commented on PR #36547: URL: https://github.com/apache/spark/pull/36547#issuecomment-1129178343 > Boolean cast not only is not going to cover all types, but also yield different results in some cases Would you give an example in which case we may diverge from

[GitHub] [spark] MaxGekk commented on a diff in pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
MaxGekk commented on code in PR #36553: URL: https://github.com/apache/spark/pull/36553#discussion_r875133200 ## core/src/main/resources/error/error-classes.json: ## @@ -22,8 +22,12 @@ "CANNOT_UP_CAST_DATATYPE" : { "message" : [ "Cannot up cast from to .\n" ] }, -

[GitHub] [spark] MaxGekk commented on pull request #36553: [SPARK-39214][SQL] Improve errors related to CAST

2022-05-17 Thread GitBox
MaxGekk commented on PR #36553: URL: https://github.com/apache/spark/pull/36553#issuecomment-1129174496 cc @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] xinrong-databricks commented on pull request #36547: [SPARK-39197][PYTHON] Implement `skipna` parameter of `GroupBy.all`

2022-05-17 Thread GitBox
xinrong-databricks commented on PR #36547: URL: https://github.com/apache/spark/pull/36547#issuecomment-1129166446 Rebased master to retrigger irrelevant failed test. No new changes after review. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] abellina commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

2022-05-17 Thread GitBox
abellina commented on PR #36505: URL: https://github.com/apache/spark/pull/36505#issuecomment-1129153161 > All other queries in the test are passing, except for the negative case for the multi-column support. It is commented out in my last patch (obviously that's not the solution)

[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
srowen commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875104045 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from empty

[GitHub] [spark] physinet commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
physinet commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875099589 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from

[GitHub] [spark] MaxGekk commented on pull request #36579: [SPARK-39212][SQL] Use double quotes for values of SQL configs/DS options in error messages

2022-05-17 Thread GitBox
MaxGekk commented on PR #36579: URL: https://github.com/apache/spark/pull/36579#issuecomment-1129142133 @srielau @panbingkun Could you take a look at the PR, please. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] srowen commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-05-17 Thread GitBox
srowen commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1129130839 OK, I just wonder if this is specific to Teradata, or whether it can be changed elsewhere higher up in the abstraction layers. But you're saying the scale/precision info is lost in

[GitHub] [spark] dtenedor commented on pull request #36583: [SPARK-39211][SQL] Support JSON scans with DEFAULT values

2022-05-17 Thread GitBox
dtenedor commented on PR #36583: URL: https://github.com/apache/spark/pull/36583#issuecomment-1129125976 Note: this PR is based on https://github.com/apache/spark/pull/36501. The additional changes comprise about 15 lines of code, in this commit:

[GitHub] [spark] abellina commented on pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

2022-05-17 Thread GitBox
abellina commented on PR #36505: URL: https://github.com/apache/spark/pull/36505#issuecomment-1129117318 Update on the SPARK-32290: SingleColumn Null Aware Anti Join Optimize failure: - The original test used a table in the subquery `testData2` which has no nulls, so I added

[GitHub] [spark] dtenedor opened a new pull request, #36583: [SPARK-39211][SQL] Support JSON scans with DEFAULT values

2022-05-17 Thread GitBox
dtenedor opened a new pull request, #36583: URL: https://github.com/apache/spark/pull/36583 ### What changes were proposed in this pull request? Support JSON scans when the table schema has associated DEFAULT column values. Example: ``` create table t(i int) using

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-05-17 Thread GitBox
Eugene-Mark commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1129110003 @HyukjinKwon The [issue-38846 ](https://issues.apache.org/jira/browse/SPARK-38846) shows that the Number type of Teradata will lose its fractional part after loading to Spark. We

[GitHub] [spark] Eugene-Mark commented on pull request #36499: [SPARK-38846][SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType

2022-05-17 Thread GitBox
Eugene-Mark commented on PR #36499: URL: https://github.com/apache/spark/pull/36499#issuecomment-1129099918 @srowen I'm also not a Teradata guy, just invokes Teradata's API from Spark and found the issue. I didn't find the document explaining the issue at Teradata side. I tried to print

[GitHub] [spark] dtenedor commented on pull request #36501: [SPARK-39143][SQL] Support CSV scans with DEFAULT values

2022-05-17 Thread GitBox
dtenedor commented on PR #36501: URL: https://github.com/apache/spark/pull/36501#issuecomment-1129093762 @HyukjinKwon I fixed the bad sync, this is ready to merge now at your convenience. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] neilagupta commented on pull request #36441: [SPARK-39091][SQL] Updating specific SQL Expression traits that don't compose when multiple are extended due to nodePatterns being final.

2022-05-17 Thread GitBox
neilagupta commented on PR #36441: URL: https://github.com/apache/spark/pull/36441#issuecomment-1129086391 @AmplabJenkins any chance I could get someone with write access to review this? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] srowen commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
srowen commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r875033677 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from empty

[GitHub] [spark] gengliangwang commented on pull request #36582: [SPARK-39210][SQL] Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread GitBox
gengliangwang commented on PR #36582: URL: https://github.com/apache/spark/pull/36582#issuecomment-1129038094 This should be the last one of query context fix when WSCG is not available. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] gengliangwang opened a new pull request, #36582: [SPARK-39210][SQL] Provide query context of Decimal overflow in AVG when WSCG is off

2022-05-17 Thread GitBox
gengliangwang opened a new pull request, #36582: URL: https://github.com/apache/spark/pull/36582 ### What changes were proposed in this pull request? Similar to https://github.com/apache/spark/pull/36525, this PR provides runtime error query context for the Average expression

[GitHub] [spark] physinet commented on a diff in pull request #36545: [SPARK-39168][PYTHON] Use all values in a python list when inferring ArrayType schema

2022-05-17 Thread GitBox
physinet commented on code in PR #36545: URL: https://github.com/apache/spark/pull/36545#discussion_r874983582 ## python/pyspark/sql/session.py: ## @@ -570,10 +570,20 @@ def _inferSchemaFromList( if not data: raise ValueError("can not infer schema from

[GitHub] [spark] Yikun opened a new pull request, #36581: [SPARK-39054][PYTHON][PS] Ensure infer schema accuracy in GroupBy.apply

2022-05-17 Thread GitBox
Yikun opened a new pull request, #36581: URL: https://github.com/apache/spark/pull/36581 ### What changes were proposed in this pull request? Ensure sampling rows >= 2 to make sure apply's infer schema is accurate. ### Why are the changes needed? GroupBy.apply infers schema

[GitHub] [spark] pan3793 commented on pull request #36496: [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

2022-05-17 Thread GitBox
pan3793 commented on PR #36496: URL: https://github.com/apache/spark/pull/36496#issuecomment-1129007630 All tests past now https://github.com/pan3793/spark/runs/6471801942?check_suite_focus=true -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] gengliangwang closed pull request #36577: [SPARK-39208][SQL] Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread GitBox
gengliangwang closed pull request #36577: [SPARK-39208][SQL] Fix query context bugs in decimal overflow under codegen mode URL: https://github.com/apache/spark/pull/36577 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang commented on pull request #36577: [SPARK-39208][SQL] Fix query context bugs in decimal overflow under codegen mode

2022-05-17 Thread GitBox
gengliangwang commented on PR #36577: URL: https://github.com/apache/spark/pull/36577#issuecomment-1128945945 Merging to master/3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on pull request #36496: [SPARK-39104][SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe

2022-05-17 Thread GitBox
pan3793 commented on PR #36496: URL: https://github.com/apache/spark/pull/36496#issuecomment-1128945680 Two jobs failed, hive slow test failed because of OOM, another is pyspark(not familiar with python), re-triggered -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] gengliangwang commented on a diff in pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources

2022-05-17 Thread GitBox
gengliangwang commented on code in PR #36562: URL: https://github.com/apache/spark/pull/36562#discussion_r874872763 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -52,6 +52,25 @@ sealed trait TimestampFormatter extends

[GitHub] [spark] gengliangwang commented on a diff in pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources

2022-05-17 Thread GitBox
gengliangwang commented on code in PR #36562: URL: https://github.com/apache/spark/pull/36562#discussion_r874864869 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -52,6 +52,25 @@ sealed trait TimestampFormatter extends

[GitHub] [spark] HyukjinKwon closed pull request #36576: [SPARK-32268][SQL][TESTS][FOLLOW-UP] Use function registry in the SparkSession

2022-05-17 Thread GitBox
HyukjinKwon closed pull request #36576: [SPARK-32268][SQL][TESTS][FOLLOW-UP] Use function registry in the SparkSession URL: https://github.com/apache/spark/pull/36576 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on pull request #36576: [SPARK-32268][SQL][TESTS][FOLLOW-UP] Use function registry in the SparkSession

2022-05-17 Thread GitBox
HyukjinKwon commented on PR #36576: URL: https://github.com/apache/spark/pull/36576#issuecomment-1128914759 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] srowen commented on a diff in pull request #36562: [SPARK-39193][SQL] Fasten Timestamp type inference of JSON/CSV data sources

2022-05-17 Thread GitBox
srowen commented on code in PR #36562: URL: https://github.com/apache/spark/pull/36562#discussion_r874840258 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala: ## @@ -52,6 +52,25 @@ sealed trait TimestampFormatter extends Serializable {

  1   2   >