[GitHub] [spark] ArvinZheng commented on pull request #35484: [SPARK-38181][SS][DOCS] Update comments in KafkaDataConsumer.scala

2022-06-01 Thread GitBox
ArvinZheng commented on PR #35484: URL: https://github.com/apache/spark/pull/35484#issuecomment-1144462037 thanks @itholic for checking, just updated my branch, and yes, it's all doc/comment fixes -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] itholic commented on pull request #35484: [SPARK-38181][SS][DOCS] Update comments in KafkaDataConsumer.scala

2022-06-01 Thread GitBox
itholic commented on PR #35484: URL: https://github.com/apache/spark/pull/35484#issuecomment-112917 Seems like the test still failing. Could you rebase? @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon commented on pull request #36729: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API suppo…

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36729: URL: https://github.com/apache/spark/pull/36729#issuecomment-1144424096 @beobest2 I am fine with separating PRs. I can merge this one, and you can go ahead for a separate fix for the link problem. I remember there are some options in Sphinx about

[GitHub] [spark] LuciferYang commented on pull request #36746: [WIP][SPARK-39354][SQL] Ensure show `Table or view not found` even if there are `dataTypeMismatchError` related to `Filter` at the same t

2022-06-01 Thread GitBox
LuciferYang commented on PR #36746: URL: https://github.com/apache/spark/pull/36746#issuecomment-1144417250 > Try to fix [SPARK-39354](https://issues.apache.org/jira/browse/SPARK-39354), if acceptable, I will update the pr description done -- This is an automated message from the

[GitHub] [spark] HyukjinKwon commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36672: URL: https://github.com/apache/spark/pull/36672#issuecomment-1144408657 The test log is a bit messy .. just copying and pasting the error I saw: ``` 2022-06-02T02:22:55.9442627Z [info] 

[GitHub] [spark] HyukjinKwon closed pull request #36711: [SPARK-39314][PS] Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread GitBox
HyukjinKwon closed pull request #36711: [SPARK-39314][PS] Respect ps.concat sort parameter to follow pandas behavior URL: https://github.com/apache/spark/pull/36711 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #36711: [SPARK-39314][PS] Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36711: URL: https://github.com/apache/spark/pull/36711#issuecomment-1144399463 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #36712: [SPARK-39326][PYTHON][PS] replace "NaN" with real "None" value in indexes

2022-06-01 Thread GitBox
HyukjinKwon closed pull request #36712: [SPARK-39326][PYTHON][PS] replace "NaN" with real "None" value in indexes URL: https://github.com/apache/spark/pull/36712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #36712: [SPARK-39326][PYTHON][PS] replace "NaN" with real "None" value in indexes

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36712: URL: https://github.com/apache/spark/pull/36712#issuecomment-1144398352 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] otterc commented on a diff in pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-06-01 Thread GitBox
otterc commented on code in PR #36734: URL: https://github.com/apache/spark/pull/36734#discussion_r887527535 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4342,6 +4346,101 @@ class DAGSchedulerSuite extends SparkFunSuite with

[GitHub] [spark] hvanhovell commented on a diff in pull request #35991: [SPARK-38675][CORE] Fix race during unlock in BlockInfoManager

2022-06-01 Thread GitBox
hvanhovell commented on code in PR #35991: URL: https://github.com/apache/spark/pull/35991#discussion_r887524913 ## core/src/main/scala/org/apache/spark/storage/BlockInfoManager.scala: ## @@ -360,12 +360,17 @@ private[storage] class BlockInfoManager extends Logging {

[GitHub] [spark] dongjoon-hyun closed pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc

2022-06-01 Thread GitBox
dongjoon-hyun closed pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc URL: https://github.com/apache/spark/pull/36744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc

2022-06-01 Thread GitBox
dongjoon-hyun commented on PR #36744: URL: https://github.com/apache/spark/pull/36744#issuecomment-1144386712 Thank you, @tgravescs and @huaxingao . Merged to master/3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] Yikun commented on pull request #36712: [SPARK-39326][PYTHON][PS] replace "NaN" with real "None" value in indexes

2022-06-01 Thread GitBox
Yikun commented on PR #36712: URL: https://github.com/apache/spark/pull/36712#issuecomment-1144386392 also cc @HyukjinKwon @xinrong-databricks ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Yikun commented on pull request #36711: [SPARK-39314][PS] Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread GitBox
Yikun commented on PR #36711: URL: https://github.com/apache/spark/pull/36711#issuecomment-1144386307 also cc @HyukjinKwon @xinrong-databricks ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Yikun commented on pull request #36711: [SPARK-39314][PS] Respect ps.concat sort parameter to follow pandas behavior

2022-06-01 Thread GitBox
Yikun commented on PR #36711: URL: https://github.com/apache/spark/pull/36711#issuecomment-1144386308 also cc @HyukjinKwon @xinrong-databricks ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] JoshRosen opened a new pull request, #36747: [SPARK-39361] Don't use Log4J2's extended throwable logging pattern in default logging configurations

2022-06-01 Thread GitBox
JoshRosen opened a new pull request, #36747: URL: https://github.com/apache/spark/pull/36747 ### What changes were proposed in this pull request? This PR addresses a performance problem in Log4J 2 related to exception logging: in certain scenarios I observed that Log4J2's default

[GitHub] [spark] beliefer commented on a diff in pull request #36663: [SPARK-38899][SQL]DS V2 supports push down datetime functions

2022-06-01 Thread GitBox
beliefer commented on code in PR #36663: URL: https://github.com/apache/spark/pull/36663#discussion_r883359744 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java: ## @@ -196,6 +196,90 @@ *Since version: 3.4.0 * *

[GitHub] [spark] wangyum commented on pull request #36746: [DON'T MERGE]

2022-06-01 Thread GitBox
wangyum commented on PR #36746: URL: https://github.com/apache/spark/pull/36746#issuecomment-1144350450 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on pull request #36746: [DON'T MERGE]

2022-06-01 Thread GitBox
LuciferYang commented on PR #36746: URL: https://github.com/apache/spark/pull/36746#issuecomment-1144349179 Try to fix [SPARK-39354](https://issues.apache.org/jira/browse/SPARK-39354), if acceptable, I will update the pr description -- This is an automated message from the Apache

[GitHub] [spark] LuciferYang commented on a diff in pull request #36746: [DON'T MERGE]

2022-06-01 Thread GitBox
LuciferYang commented on code in PR #36746: URL: https://github.com/apache/spark/pull/36746#discussion_r887438317 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1170,13 +1170,25 @@ class AnalysisSuite extends AnalysisTest with

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-06-01 Thread GitBox
HyukjinKwon commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r887439636 ## python/pyspark/sql/session.py: ## @@ -611,8 +612,8 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first()

[GitHub] [spark] beliefer commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-01 Thread GitBox
beliefer commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r887439535 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -359,6 +359,32 @@ case class Percentile( ) } +//

[GitHub] [spark] LuciferYang commented on a diff in pull request #36746: [DON'T MERGE]

2022-06-01 Thread GitBox
LuciferYang commented on code in PR #36746: URL: https://github.com/apache/spark/pull/36746#discussion_r887438317 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1170,13 +1170,25 @@ class AnalysisSuite extends AnalysisTest with

[GitHub] [spark] LuciferYang opened a new pull request, #36746: [DON'T MERGE]

2022-06-01 Thread GitBox
LuciferYang opened a new pull request, #36746: URL: https://github.com/apache/spark/pull/36746 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] sadikovi commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread GitBox
sadikovi commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r887433430 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -427,6 +428,7 @@ class SessionCatalog(

[GitHub] [spark] HyukjinKwon closed pull request #36739: [SPARK-39040][SQL][FOLLOWUP] Use a unique table name in conditional-functions.sql

2022-06-01 Thread GitBox
HyukjinKwon closed pull request #36739: [SPARK-39040][SQL][FOLLOWUP] Use a unique table name in conditional-functions.sql URL: https://github.com/apache/spark/pull/36739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #36739: [SPARK-39040][SQL][FOLLOWUP] Use a unique table name in conditional-functions.sql

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36739: URL: https://github.com/apache/spark/pull/36739#issuecomment-1144323059 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc

2022-06-01 Thread GitBox
dongjoon-hyun commented on PR #36744: URL: https://github.com/apache/spark/pull/36744#issuecomment-1144313756 Thank you for review, @tgravescs . Yes, I guess we can do clean deprecation during Apache Spark 3.4 timeframe. For Spark 3.3.0, it will be enough to deliver new generalized

[GitHub] [spark] tgravescs commented on pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc

2022-06-01 Thread GitBox
tgravescs commented on PR #36744: URL: https://github.com/apache/spark/pull/36744#issuecomment-1144312306 This looks fine to me. If the configuration is set by default we need follow up issue I can look more tomorrow -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36744: [SPARK-39360][K8S] Remove deprecation of `spark.kubernetes.memoryOverheadFactor` and recover doc

2022-06-01 Thread GitBox
dongjoon-hyun commented on code in PR #36744: URL: https://github.com/apache/spark/pull/36744#discussion_r887412923 ## docs/running-on-kubernetes.md: ## @@ -1137,6 +1137,16 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 + +

[GitHub] [spark] dtenedor commented on pull request #36745: [SPARK-39359][SQL]Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread GitBox
dtenedor commented on PR #36745: URL: https://github.com/apache/spark/pull/36745#issuecomment-1144288442 @HyukjinKwon @sadikovi this PR restricts DEFAULT columns to supported data sources, it should be the last one to complete this open-source work, if you could help review (or refer

[GitHub] [spark] dtenedor opened a new pull request, #36745: [SPARK-39359][SQL]Restrict DEFAULT columns to allowlist of supported data source types

2022-06-01 Thread GitBox
dtenedor opened a new pull request, #36745: URL: https://github.com/apache/spark/pull/36745 ### What changes were proposed in this pull request? Restrict DEFAULT columns to allowlist of supported data source types. Example: ``` > create table t(a string) using avro

[GitHub] [spark] github-actions[bot] commented on pull request #35329: [SPARK-33326][SQL] Update Partition statistic parameters after ANALYZE TABLE ... PARTITION()

2022-06-01 Thread GitBox
github-actions[bot] commented on PR #35329: URL: https://github.com/apache/spark/pull/35329#issuecomment-1144278910 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] dongjoon-hyun commented on pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on PR #35504: URL: https://github.com/apache/spark/pull/35504#issuecomment-1144269967 To mitigate the changes, I made a PR. This PR's contribution is still valid. Only the deprecation is removed and the doc is recovered. -

[GitHub] [spark] manuzhang closed pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
manuzhang closed pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output URL: https://github.com/apache/spark/pull/36733 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] xinrong-databricks closed pull request #36743: [WIP] Support createDataFrame from a NumPy array

2022-06-01 Thread GitBox
xinrong-databricks closed pull request #36743: [WIP] Support createDataFrame from a NumPy array URL: https://github.com/apache/spark/pull/36743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun opened a new pull request, #36744: [SPARK-39360][K8S] Remove deprecation of spark.kubernetes.memoryOverheadFactor and recover doc

2022-06-01 Thread GitBox
dongjoon-hyun opened a new pull request, #36744: URL: https://github.com/apache/spark/pull/36744 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] manuzhang commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
manuzhang commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1144265678 Okay. Then I also prefer to keep the current behavior unless many others hit the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on code in PR #35504: URL: https://github.com/apache/spark/pull/35504#discussion_r887374892 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ## @@ -53,18 +53,23 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on PR #35504: URL: https://github.com/apache/spark/pull/35504#issuecomment-1144250260 Hi, @tgravescs . Is there a way to mitigate this change in K8s environment at Apache Spark 3.3.0? - https://github.com/apache/spark/pull/35504#pullrequestreview-992770741 --

[GitHub] [spark] dongjoon-hyun commented on pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on PR #35504: URL: https://github.com/apache/spark/pull/35504#issuecomment-1144240878 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on code in PR #35504: URL: https://github.com/apache/spark/pull/35504#discussion_r887374892 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ## @@ -53,18 +53,23 @@ private[spark] class

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35504: [SPARK-38194][YARN][MESOS][K8S] Make memory overhead factor configurable

2022-06-01 Thread GitBox
dongjoon-hyun commented on code in PR #35504: URL: https://github.com/apache/spark/pull/35504#discussion_r887374417 ## docs/running-on-kubernetes.md: ## @@ -1137,15 +1137,6 @@ See the [configuration page](configuration.html) for information on Spark config 3.0.0 - -

[GitHub] [spark] HyukjinKwon commented on pull request #36447: [SPARK-38807][CORE] Fix the startup error of spark shell on Windows

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36447: URL: https://github.com/apache/spark/pull/36447#issuecomment-1144231341 Yup, that test is broken a while ago.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-06-01 Thread GitBox
HyukjinKwon commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r887367163 ## python/pyspark/sql/session.py: ## @@ -611,8 +611,8 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first()

[GitHub] [spark] srowen commented on pull request #36447: [SPARK-38807][CORE] Fix the startup error of spark shell on Windows

2022-06-01 Thread GitBox
srowen commented on PR #36447: URL: https://github.com/apache/spark/pull/36447#issuecomment-1144172443 The windows tests failed, but, it appears unrelated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] xinrong-databricks opened a new pull request, #36743: Support createDataFrame from a NumPy array

2022-06-01 Thread GitBox
xinrong-databricks opened a new pull request, #36743: URL: https://github.com/apache/spark/pull/36743 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change?

[GitHub] [spark] ueshin commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-06-01 Thread GitBox
ueshin commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r887134684 ## python/pyspark/sql/session.py: ## @@ -611,8 +611,8 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first() -

[GitHub] [spark] MaxGekk opened a new pull request, #36742: [SPARK-39346][SQL][3.3] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
MaxGekk opened a new pull request, #36742: URL: https://github.com/apache/spark/pull/36742 ### What changes were proposed in this pull request? In the PR, I propose to catch asserts/illegal state exception on each phase of query execution: ANALYSIS, OPTIMIZATION, PLANNING, and convert

[GitHub] [spark] ueshin commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-06-01 Thread GitBox
ueshin commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r887134684 ## python/pyspark/sql/session.py: ## @@ -611,8 +611,8 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first() -

[GitHub] [spark] ravwojdyla commented on a diff in pull request #36430: [WIP][SPARK-38904] Select by schema

2022-06-01 Thread GitBox
ravwojdyla commented on code in PR #36430: URL: https://github.com/apache/spark/pull/36430#discussion_r887126464 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1593,6 +1593,35 @@ class Dataset[T] private[sql]( @scala.annotation.varargs def

[GitHub] [spark] MaxGekk closed pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
MaxGekk closed pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase URL: https://github.com/apache/spark/pull/36704 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] MaxGekk commented on pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
MaxGekk commented on PR #36704: URL: https://github.com/apache/spark/pull/36704#issuecomment-1143897486 Merging to master. Thank you, @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] sunchao commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-06-01 Thread GitBox
sunchao commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1143869351 Merged, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] sunchao closed pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-06-01 Thread GitBox
sunchao closed pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated URL: https://github.com/apache/spark/pull/36697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] tianshuang opened a new pull request, #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-01 Thread GitBox
tianshuang opened a new pull request, #36741: URL: https://github.com/apache/spark/pull/36741 ### What changes were proposed in this pull request? * Fixed memory leak caused `RawStore` cleanup not to take effect due to different `threadLocalMS` instances being manipulated * Fixed

[GitHub] [spark] cxzl25 opened a new pull request, #36740: [SPARK-39355][SQL] UnresolvedAttribute should only use CatalystSqlParser if name contains dot

2022-06-01 Thread GitBox
cxzl25 opened a new pull request, #36740: URL: https://github.com/apache/spark/pull/36740 ### What changes were proposed in this pull request? Only if the name of `UnresolvedAttribute` contains dot, try to use `CatalystSqlParser.parseMultipartIdentifier` ### Why are the changes

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143744385 if we don't add a new config, then this is another potential "breaking" change: someone may have disabled autoBucketedScan but their scan is still not bucketed because not all bucketed

[GitHub] [spark] MaxGekk commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
MaxGekk commented on code in PR #36704: URL: https://github.com/apache/spark/pull/36704#discussion_r886901736 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -319,7 +320,8 @@ abstract class StreamExecution( // This is a

[GitHub] [spark] manuzhang commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
manuzhang commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143705745 Thanks for the thorough explanation. I thought what we put in PR description is high level summary of the changes. I will update the description later. Meanwhile, I'm hesitant

[GitHub] [spark] cloud-fan commented on pull request #36739: [SPARK-39040][SQL][FOLLOWUP] Use a unique table name in conditional-functions.sql

2022-06-01 Thread GitBox
cloud-fan commented on PR #36739: URL: https://github.com/apache/spark/pull/36739#issuecomment-1143700660 cc @ulysses-you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan opened a new pull request, #36739: [SPARK-39040][SQL][FOLLOWUP] Use a unique table name in conditional-functions.sql

2022-06-01 Thread GitBox
cloud-fan opened a new pull request, #36739: URL: https://github.com/apache/spark/pull/36739 ### What changes were proposed in this pull request? This is a followup of https://github.com/apache/spark/pull/36376, to use a unique table name in the test. `t` is a quite common

[GitHub] [spark] cloud-fan closed pull request #36646: [SPARK-39267][SQL] Clean up dsl unnecessary symbol

2022-06-01 Thread GitBox
cloud-fan closed pull request #36646: [SPARK-39267][SQL] Clean up dsl unnecessary symbol URL: https://github.com/apache/spark/pull/36646 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143641829 > What changes were proposed in this pull request? > Currently, bucketed scan is disabled if bucket columns are not in scan output. This PR proposes to move the check into

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143605953 So one approach is to use bucket scan physically but still report `UnknownPartitioning`. We can probably add a new legacy config to force the physical bucket scan. -- This is an

[GitHub] [spark] ulysses-you commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
ulysses-you commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886799362 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3055,21 +3055,6 @@ class DataFrameSuite extends QueryTest assert(df2.isLocal) }

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143603658 It must be `UnknownPartitioning`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] manuzhang commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
manuzhang commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143598381 Then what's the expected partitioning for bucket scan output with non-bucket columns? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] ulysses-you commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
ulysses-you commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886787746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -232,3 +216,33 @@ case class CheckOverflowInSum(

[GitHub] [spark] AmplabJenkins commented on pull request #36729: [SPARK-39295][PYTHON][DOCS] Improve documentation of pandas API suppo…

2022-06-01 Thread GitBox
AmplabJenkins commented on PR #36729: URL: https://github.com/apache/spark/pull/36729#issuecomment-1143593235 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon commented on pull request #36447: [SPARK-38807][CORE] Fix the startup error of spark shell on Windows

2022-06-01 Thread GitBox
HyukjinKwon commented on PR #36447: URL: https://github.com/apache/spark/pull/36447#issuecomment-1143591932 Looks fine to me 2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886782526 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -490,12 +622,27 @@ trait DivModLike extends BinaryArithmetic {

[GitHub] [spark] ulysses-you commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
ulysses-you commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886782440 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -778,16 +1002,24 @@ case class Pmod( val javaType =

[GitHub] [spark] ulysses-you commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
ulysses-you commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886781843 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -490,12 +622,27 @@ trait DivModLike extends BinaryArithmetic {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct the behavior of creating DataFrame from an RDD

2022-06-01 Thread GitBox
HyukjinKwon commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r886780576 ## python/pyspark/sql/session.py: ## @@ -611,8 +611,8 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first()

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143583033 Yea this is the problem: we cannot hide a correctness bug under a flag and claim that the bug has been fixed. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886762319 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3055,21 +3055,6 @@ class DataFrameSuite extends QueryTest assert(df2.isLocal) }

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886761154 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -232,3 +216,33 @@ case class CheckOverflowInSum( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886760416 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -232,3 +216,33 @@ case class CheckOverflowInSum( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886759507 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -232,3 +216,33 @@ case class CheckOverflowInSum( override

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886758935 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -778,16 +1002,24 @@ case class Pmod( val javaType =

[GitHub] [spark] cloud-fan commented on a diff in pull request #36698: [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36698: URL: https://github.com/apache/spark/pull/36698#discussion_r886757508 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -490,12 +622,27 @@ trait DivModLike extends BinaryArithmetic {

[GitHub] [spark] manuzhang commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
manuzhang commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143553245 The new behavior is when auto bucketed scan is enabled, if not all bucketed columns are read, bucketed scan is disabled, which is > an optimization that we CAN disable bucket scan

[GitHub] [spark] cloud-fan commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r886740706 ## sql/core/src/test/resources/sql-tests/inputs/percentiles.sql: ## @@ -0,0 +1,208 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW aggr AS SELECT * FROM VALUES

[GitHub] [spark] cloud-fan commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r886740530 ## sql/core/src/test/resources/sql-tests/inputs/percentiles.sql: ## @@ -0,0 +1,208 @@ +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW aggr AS SELECT * FROM VALUES

[GitHub] [spark] cloud-fan commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r886739715 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -359,6 +359,32 @@ case class Percentile( ) } +//

[GitHub] [spark] cloud-fan closed pull request #36735: [SPARK-39350][SQL] DESC NAMESPACE EXTENDED should redact properties

2022-06-01 Thread GitBox
cloud-fan closed pull request #36735: [SPARK-39350][SQL] DESC NAMESPACE EXTENDED should redact properties URL: https://github.com/apache/spark/pull/36735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #36735: [SPARK-39350][SQL] DESC NAMESPACE EXTENDED should redact properties

2022-06-01 Thread GitBox
cloud-fan commented on PR #36735: URL: https://github.com/apache/spark/pull/36735#issuecomment-1143535008 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #36733: [SPARK-39344][SQL] Only disable bucketing when autoBucketedScan is enabled if bucket columns are not in scan output

2022-06-01 Thread GitBox
cloud-fan commented on PR #36733: URL: https://github.com/apache/spark/pull/36733#issuecomment-1143534078 @manuzhang can you explain the new behavior more clearly? In which case the bucketed scan is enabled? To clarify, auto bucket scan is an optimization that we CAN disable bucket scan

[GitHub] [spark] HyukjinKwon closed pull request #36738: Just a PR

2022-06-01 Thread GitBox
HyukjinKwon closed pull request #36738: Just a PR URL: https://github.com/apache/spark/pull/36738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
cloud-fan commented on code in PR #36704: URL: https://github.com/apache/spark/pull/36704#discussion_r886731669 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -319,7 +320,8 @@ abstract class StreamExecution( // This is

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-01 Thread GitBox
HyukjinKwon commented on code in PR #36683: URL: https://github.com/apache/spark/pull/36683#discussion_r886724539 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2575,6 +2575,18 @@ object SQLConf { .booleanConf

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-01 Thread GitBox
HyukjinKwon commented on code in PR #36683: URL: https://github.com/apache/spark/pull/36683#discussion_r886719524 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2575,6 +2575,18 @@ object SQLConf { .booleanConf

[GitHub] [spark] AmplabJenkins commented on pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to tr

2022-06-01 Thread GitBox
AmplabJenkins commented on PR #36734: URL: https://github.com/apache/spark/pull/36734#issuecomment-1143446598 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk commented on a diff in pull request #36704: [SPARK-39346][SQL] Convert asserts/illegal state exception to internal errors on each phase

2022-06-01 Thread GitBox
MaxGekk commented on code in PR #36704: URL: https://github.com/apache/spark/pull/36704#discussion_r886633465 ## connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala: ## @@ -1376,6 +1376,13 @@ class KafkaMicroBatchV1SourceSuite

[GitHub] [spark] Yikun commented on pull request #36353: [SPARK-38946][PYTHON][PS] Generates a new dataframe instead of operating inplace in df.eval/update/fillna/setitem

2022-06-01 Thread GitBox
Yikun commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1143313995 https://github.com/pandas-dev/pandas/issues/47188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-06-01 Thread GitBox
beliefer commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r886539391 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/UserDefinedAggregateFunc.java: ## @@ -0,0 +1,67 @@ +/* + * Licensed to the Apache

[GitHub] [spark] AmplabJenkins commented on pull request #36737: [SPARK-39347] [SS] Generate wrong time window when (timestamp-startTime) % slideDuration…

2022-06-01 Thread GitBox
AmplabJenkins commented on PR #36737: URL: https://github.com/apache/spark/pull/36737#issuecomment-1143239527 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] AmplabJenkins commented on pull request #36738: Just a PR

2022-06-01 Thread GitBox
AmplabJenkins commented on PR #36738: URL: https://github.com/apache/spark/pull/36738#issuecomment-1143239464 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

  1   2   >