[GitHub] [spark] zhengruifeng opened a new pull request, #36648: [SPARK-39268][SQL] AttachDistributedSequenceExec do not checkpoint childRDD with single partition

2022-05-23 Thread GitBox
zhengruifeng opened a new pull request, #36648: URL: https://github.com/apache/spark/pull/36648 ### What changes were proposed in this pull request? do not checkpoint child rdd when it only has 1 partition ### Why are the changes needed? avoid necessary checkpoint when

[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

2022-05-23 Thread GitBox
wangyum commented on code in PR #36628: URL: https://github.com/apache/spark/pull/36628#discussion_r880122085 ## sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out: ## @@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8):

[GitHub] [spark] beliefer commented on a diff in pull request #36593: [SPARK-39139][SQL] DS V2 push-down framework supports DS V2 UDF

2022-05-23 Thread GitBox
beliefer commented on code in PR #36593: URL: https://github.com/apache/spark/pull/36593#discussion_r880103167 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala: ## @@ -305,6 +318,34 @@ abstract class JdbcDialect extends Serializable with Logging{ }

[GitHub] [spark] wangyum commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

2022-05-23 Thread GitBox
wangyum commented on code in PR #36628: URL: https://github.com/apache/spark/pull/36628#discussion_r880095078 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable { De

[GitHub] [spark] beliefer commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

2022-05-23 Thread GitBox
beliefer commented on code in PR #36519: URL: https://github.com/apache/spark/pull/36519#discussion_r880091005 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2102,6 +2102,16 @@ class Dataset[T] private[sql]( Limit(Literal(n), logicalPlan) } + /*

[GitHub] [spark] dongjoon-hyun commented on pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36386: URL: https://github.com/apache/spark/pull/36386#issuecomment-1135428525 Thank you for rechecking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36628: URL: https://github.com/apache/spark/pull/36628#discussion_r880068159 ## sql/core/src/test/resources/sql-tests/results/ansi/decimalArithmeticOperations.sql.out: ## @@ -142,6 +142,108 @@ struct<(12345678912345.123456789123 / 1.2345678E-8

[GitHub] [spark] cloud-fan commented on a diff in pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36628: URL: https://github.com/apache/spark/pull/36628#discussion_r880067844 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -502,7 +502,8 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] itholic commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
itholic commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880065419 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first() -

[GitHub] [spark] itholic commented on pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on PR #36647: URL: https://github.com/apache/spark/pull/36647#issuecomment-1135414414 > Does the URL change? We might need to keep the mapping between old URL and new URL. If that's complicated, we can do it in a followup. Yes, the URL has changed, let me find a way

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880061918 ## python/docs/source/reference/pyspark.sql/catalog.rst: ## @@ -0,0 +1,48 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor lic

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880061516 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -0,0 +1,54 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contribut

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880061219 ## python/docs/source/reference/pyspark.sql/functions.rst: ## @@ -0,0 +1,272 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor

[GitHub] [spark] cloud-fan closed pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
cloud-fan closed pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up URL: https://github.com/apache/spark/pull/36503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] cloud-fan commented on pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
cloud-fan commented on PR #36503: URL: https://github.com/apache/spark/pull/36503#issuecomment-1135412773 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r880060129 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -125,6 +135,31 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r880059787 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -117,14 +128,42 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] HyukjinKwon commented on pull request #36583: [SPARK-39211][SQL] Support JSON scans with DEFAULT values

2022-05-23 Thread GitBox
HyukjinKwon commented on PR #36583: URL: https://github.com/apache/spark/pull/36583#issuecomment-1135409978 I will defer to @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] dongjoon-hyun commented on pull request #36645: [SPARK-39266][CORE] Cleanup unused `spark.rpc.numRetries` and `spark.rpc.retry.wait` configs

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36645: URL: https://github.com/apache/spark/pull/36645#issuecomment-1135409291 Merged to master. Thank you all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun closed pull request #36645: [SPARK-39266][CORE] Cleanup unused `spark.rpc.numRetries` and `spark.rpc.retry.wait` configs

2022-05-23 Thread GitBox
dongjoon-hyun closed pull request #36645: [SPARK-39266][CORE] Cleanup unused `spark.rpc.numRetries` and `spark.rpc.retry.wait` configs URL: https://github.com/apache/spark/pull/36645 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880057867 ## python/docs/source/reference/pyspark.sql/catalog.rst: ## @@ -0,0 +1,48 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributor

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880057519 ## python/docs/source/reference/pyspark.sql/spark_session.rst: ## @@ -0,0 +1,54 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880057050 ## python/docs/source/reference/pyspark.sql/core_classes.rst: ## @@ -0,0 +1,37 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contri

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880056609 ## python/docs/source/reference/pyspark.sql/functions.rst: ## @@ -0,0 +1,272 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contribu

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880055945 ## python/docs/source/reference/pyspark.sql/core_classes.rst: ## @@ -0,0 +1,37 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributo

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880055945 ## python/docs/source/reference/pyspark.sql/core_classes.rst: ## @@ -0,0 +1,37 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributo

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880055802 ## python/docs/source/reference/pyspark.sql/core_classes.rst: ## @@ -0,0 +1,37 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contri

[GitHub] [spark] itholic commented on a diff in pull request #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic commented on code in PR #36647: URL: https://github.com/apache/spark/pull/36647#discussion_r880055945 ## python/docs/source/reference/pyspark.sql/core_classes.rst: ## @@ -0,0 +1,37 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contributo

[GitHub] [spark] itholic opened a new pull request, #36647: [SPARK-39253][DOCS][PYTHON] Improve PySpark API reference to be more readable

2022-05-23 Thread GitBox
itholic opened a new pull request, #36647: URL: https://github.com/apache/spark/pull/36647 ### What changes were proposed in this pull request? This PR proposes to improve the PySpark API reference page to be more readable, So far, the PySpark documentation especially ["Spark S

[GitHub] [spark] amaliujia commented on pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on PR #36503: URL: https://github.com/apache/spark/pull/36503#issuecomment-1135403198 @cloud-fan tests are green now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] zhengruifeng commented on a diff in pull request #36599: [SPARK-39228][PYTHON][PS] Implement `skipna` of `Series.argmax`

2022-05-23 Thread GitBox
zhengruifeng commented on code in PR #36599: URL: https://github.com/apache/spark/pull/36599#discussion_r880050644 ## python/pyspark/pandas/series.py: ## @@ -6255,36 +6261,47 @@ def argmax(self) -> int: Consider dataset containing cereal calories -

[GitHub] [spark] wangyum commented on pull request #36628: [SPARK-39248][SQL] Improve divide performance for decimal type

2022-05-23 Thread GitBox
wangyum commented on PR #36628: URL: https://github.com/apache/spark/pull/36628#issuecomment-1135393260 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] HeartSaVioR closed pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
HeartSaVioR closed pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix URL: https://github.com/apache/spark/pull/36642 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HeartSaVioR commented on pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
HeartSaVioR commented on PR #36642: URL: https://github.com/apache/spark/pull/36642#issuecomment-1135391746 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srielau commented on a diff in pull request #36639: [SPARK-39261][CORE] Improve newline formatting for error messages

2022-05-23 Thread GitBox
srielau commented on code in PR #36639: URL: https://github.com/apache/spark/pull/36639#discussion_r880044226 ## core/src/main/resources/error/error-classes.json: ## @@ -1,319 +1,519 @@ { "AMBIGUOUS_FIELD_NAME" : { -"message" : [ "Field name is ambiguous and has matchi

[GitHub] [spark] zhengruifeng commented on pull request #36555: [SPARK-39189][PS] Support limit_area parameter in pandas API on Spark

2022-05-23 Thread GitBox
zhengruifeng commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1135383195 thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] Yikun commented on pull request #36353: [SPARK-38946][PYTHON] Fix different behavior in setitem

2022-05-23 Thread GitBox
Yikun commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1135376632 @itholic We might want to do some special process when calling `_update_internal_frame` for pandas 1.4.x. Will update today. -- This is an automated message from the Apache Git Service. T

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
xinrong-databricks commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880035876 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.fi

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
xinrong-databricks commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880035876 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.fi

[GitHub] [spark] Yikun commented on pull request #36355: [SPARK-38982][PYTHON] Skip categories setter test

2022-05-23 Thread GitBox
Yikun commented on PR #36355: URL: https://github.com/apache/spark/pull/36355#issuecomment-1135372411 @itholic Now we'd better to only fix test, becasue `categories.setter` will not be completed supported in Pandas, and also some inplace update methods like `set_categories` are deprecated.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880031452 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first() -

[GitHub] [spark] ulysses-you opened a new pull request, #36646: [SPARK-39267][SQL] Clean up dsl unnecessary symbol

2022-05-23 Thread GitBox
ulysses-you opened a new pull request, #36646: URL: https://github.com/apache/spark/pull/36646 ### What changes were proposed in this pull request? remove two methods in dsl: - subquery(alias: Symbol) - as(alias: Symbol) ### Why are the changes needed? dsl i

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
xinrong-databricks commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880030149 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.fi

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
xinrong-databricks commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880030149 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.fi

[GitHub] [spark] cloud-fan commented on a diff in pull request #36639: [SPARK-39261][CORE] Improve newline formatting for error messages

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36639: URL: https://github.com/apache/spark/pull/36639#discussion_r880027150 ## core/src/main/resources/error/error-classes.json: ## @@ -1,319 +1,519 @@ { "AMBIGUOUS_FIELD_NAME" : { -"message" : [ "Field name is ambiguous and has matc

[GitHub] [spark] itholic commented on pull request #34212: [SPARK-36402][PYTHON] Implement Series.combine

2022-05-23 Thread GitBox
itholic commented on PR #34212: URL: https://github.com/apache/spark/pull/34212#issuecomment-1135358555 The Github CI bug was fixed, can you rebase? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] itholic commented on pull request #36353: [SPARK-38946][PYTHON] Fix different behavior in setitem

2022-05-23 Thread GitBox
itholic commented on PR #36353: URL: https://github.com/apache/spark/pull/36353#issuecomment-1135358115 Just confirming, any update on this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] itholic commented on pull request #36355: [SPARK-38982][PYTHON] Skip categories setter test

2022-05-23 Thread GitBox
itholic commented on PR #36355: URL: https://github.com/apache/spark/pull/36355#issuecomment-1135357556 Seems like they replied https://github.com/pandas-dev/pandas/issues/46820#issuecomment-1113889315. Maybe can we just close ? -- This is an automated message from the Apache Git S

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
HyukjinKwon commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r880021145 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.first() -

[GitHub] [spark] AmplabJenkins commented on pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
AmplabJenkins commented on PR #36642: URL: https://github.com/apache/spark/pull/36642#issuecomment-1135352468 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36643: [SPARK-39265][SQL] Support non-vectorized Parquet scans with DEFAULT values

2022-05-23 Thread GitBox
AmplabJenkins commented on PR #36643: URL: https://github.com/apache/spark/pull/36643#issuecomment-1135352445 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36641: [SPARK-39263] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

2022-05-23 Thread GitBox
AmplabJenkins commented on PR #36641: URL: https://github.com/apache/spark/pull/36641#issuecomment-1135352493 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon closed pull request #36555: [SPARK-39189][PS] Support limit_area parameter in pandas API on Spark

2022-05-23 Thread GitBox
HyukjinKwon closed pull request #36555: [SPARK-39189][PS] Support limit_area parameter in pandas API on Spark URL: https://github.com/apache/spark/pull/36555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] HyukjinKwon commented on pull request #36555: [SPARK-39189][PS] Support limit_area parameter in pandas API on Spark

2022-05-23 Thread GitBox
HyukjinKwon commented on PR #36555: URL: https://github.com/apache/spark/pull/36555#issuecomment-1135351855 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] allisonwang-db commented on pull request #36386: [SPARK-38918][SQL][3.2] Nested column pruning should filter out attributes that do not belong to the current relation

2022-05-23 Thread GitBox
allisonwang-db commented on PR #36386: URL: https://github.com/apache/spark/pull/36386#issuecomment-1135332088 @dongjoon-hyun Looks like it still has the same issue: ```bash # Test passed build/sbt "sql/testOnly *PlanStabilitySuite" # Test failed for q4 and q5 build/sbt "sq

[GitHub] [spark] srielau commented on a diff in pull request #36639: [SPARK-39261][CORE] Improve newline formatting for error messages

2022-05-23 Thread GitBox
srielau commented on code in PR #36639: URL: https://github.com/apache/spark/pull/36639#discussion_r880001283 ## core/src/main/resources/error/error-classes.json: ## @@ -1,319 +1,519 @@ { "AMBIGUOUS_FIELD_NAME" : { -"message" : [ "Field name is ambiguous and has matchi

[GitHub] [spark] huaxingao commented on pull request #36644: [SPARK-37523][SQL] Re-optimize partitions in Distribution and Ordering if numPartitions is not specified

2022-05-23 Thread GitBox
huaxingao commented on PR #36644: URL: https://github.com/apache/spark/pull/36644#issuecomment-1135317504 cc @cloud-fan @aokolnychyi @ulysses-you This PR is ready for review. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] cloud-fan commented on a diff in pull request #36639: [SPARK-39261][CORE] Improve newline formatting for error messages

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36639: URL: https://github.com/apache/spark/pull/36639#discussion_r879992776 ## core/src/main/resources/error/error-classes.json: ## @@ -1,319 +1,519 @@ { "AMBIGUOUS_FIELD_NAME" : { -"message" : [ "Field name is ambiguous and has matc

[GitHub] [spark] anishshri-db commented on a diff in pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
anishshri-db commented on code in PR #36642: URL: https://github.com/apache/spark/pull/36642#discussion_r879991863 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -451,8 +451,10 @@ abstract class StreamExecution( // Offse

[GitHub] [spark] anishshri-db commented on a diff in pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
anishshri-db commented on code in PR #36642: URL: https://github.com/apache/spark/pull/36642#discussion_r879982515 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -451,8 +451,10 @@ abstract class StreamExecution( // Offse

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879978761 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] LuciferYang commented on pull request #36636: [SPARK-39256][CORE] Reduce multiple file attribute calls of `JavaUtils#deleteRecursivelyUsingJavaIO`

2022-05-23 Thread GitBox
LuciferYang commented on PR #36636: URL: https://github.com/apache/spark/pull/36636#issuecomment-1135303165 [127a515](https://github.com/apache/spark/pull/36636/commits/127a51581da94995b51ece11caf181200416d3bf) merge with master and rerun all tests -- This is an automated message from th

[GitHub] [spark] jerqi commented on a diff in pull request #36519: [SPARK-39159][SQL] Add new Dataset API for Offset

2022-05-23 Thread GitBox
jerqi commented on code in PR #36519: URL: https://github.com/apache/spark/pull/36519#discussion_r879981480 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2102,6 +2102,16 @@ class Dataset[T] private[sql]( Limit(Literal(n), logicalPlan) } + /** +

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879980347 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879978761 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879978761 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879979107 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] amaliujia commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879978761 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] LuciferYang commented on pull request #36637: [SPARK-39258][TESTS] Fix `Hide credentials in show create table`

2022-05-23 Thread GitBox
LuciferYang commented on PR #36637: URL: https://github.com/apache/spark/pull/36637#issuecomment-1135297918 thanks all ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879972900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] cloud-fan commented on a diff in pull request #36503: [SPARK-39144][SQL] Nested subquery expressions deduplicate relations should be done bottom up

2022-05-23 Thread GitBox
cloud-fan commented on code in PR #36503: URL: https://github.com/apache/spark/pull/36503#discussion_r879972792 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -752,9 +752,30 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] github-actions[bot] closed pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE

2022-05-23 Thread GitBox
github-actions[bot] closed pull request #35425: [SPARK-38129][SQL] Adaptively enable timeout for BroadcastQueryStageExec in AQE URL: https://github.com/apache/spark/pull/35425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [spark] github-actions[bot] closed pull request #35484: [SPARK-38181][SS] Update comments in KafkaDataConsumer.scala

2022-05-23 Thread GitBox
github-actions[bot] closed pull request #35484: [SPARK-38181][SS] Update comments in KafkaDataConsumer.scala URL: https://github.com/apache/spark/pull/35484 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun commented on pull request #36627: [SPARK-39250][BUILD] Upgrade Jackson to 2.13.3

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36627: URL: https://github.com/apache/spark/pull/36627#issuecomment-1135262491 Yes, that failure was irrelevant to this PR, @MaxGekk . - It was caused via https://github.com/apache/spark/pull/36632 - It was fixed via https://github.com/apache/spark/pull/36

[GitHub] [spark] JoshRosen opened a new pull request, #36645: [SPARK-39266][CORE] Cleanup unused `spark.rpc.numRetries` and `spark.rpc.retry.wait` configs

2022-05-23 Thread GitBox
JoshRosen opened a new pull request, #36645: URL: https://github.com/apache/spark/pull/36645 ### What changes were proposed in this pull request? This PR cleans up the `spark.rpc.numRetries` and `spark.rpc.retry.wait` configs, both of which are ununused. ### Why are the cha

[GitHub] [spark] huaxingao closed pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-23 Thread GitBox
huaxingao closed pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified URL: https://github.com/apache/spark/pull/34785 -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [spark] huaxingao commented on pull request #34785: [SPARK-37523][SQL] Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2022-05-23 Thread GitBox
huaxingao commented on PR #34785: URL: https://github.com/apache/spark/pull/34785#issuecomment-1135252175 It's a bit too hard to rebase. I will close this PR and the new one is here https://github.com/apache/spark/pull/36644. Thank you all very much for reviewing! -- This is an automated

[GitHub] [spark] huaxingao opened a new pull request, #36644: [SPARK-37523][SQL] Re-optimize partitions in Distribution and Ordering if numPartitions is not specified

2022-05-23 Thread GitBox
huaxingao opened a new pull request, #36644: URL: https://github.com/apache/spark/pull/36644 ### What changes were proposed in this pull request? Re-optimize partitions in Distribution and Ordering if numPartitions is not specified. Support both strictly required distribution a

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
HeartSaVioR commented on code in PR #36642: URL: https://github.com/apache/spark/pull/36642#discussion_r879934124 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -451,8 +451,10 @@ abstract class StreamExecution( // Offset

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
HeartSaVioR commented on code in PR #36642: URL: https://github.com/apache/spark/pull/36642#discussion_r879934124 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala: ## @@ -451,8 +451,10 @@ abstract class StreamExecution( // Offset

[GitHub] [spark] anishshri-db commented on pull request #36642: [SPARK-39264][SS] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
anishshri-db commented on PR #36642: URL: https://github.com/apache/spark/pull/36642#issuecomment-1135224215 @HeartSaVioR - Could you please take a look and merge ? Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] dtenedor opened a new pull request, #36643: [SPARK-39245][SQL] Support non-vectorized Parquet scans with DEFAULT values

2022-05-23 Thread GitBox
dtenedor opened a new pull request, #36643: URL: https://github.com/apache/spark/pull/36643 ### What changes were proposed in this pull request? Support non-vectorized Parquet scans when the table schema has associated DEFAULT column values. (Note: this PR depends on https://gi

[GitHub] [spark] anishshri-db commented on pull request #36642: [SPARK-39264] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
anishshri-db commented on PR #36642: URL: https://github.com/apache/spark/pull/36642#issuecomment-1135208135 @alex-balikov @HeartSaVioR @viirya - Could you folks please take a look ? Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] anishshri-db opened a new pull request, #36642: [SPARK-39264] Fix type check and conversion to longOffset for awaitOffset fix

2022-05-23 Thread GitBox
anishshri-db opened a new pull request, #36642: URL: https://github.com/apache/spark/pull/36642 … ### What changes were proposed in this pull request? Fix type check and conversion to longOffset for awaitOffset fix. Based on discussion with comments from @alex-balikov

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [WIP][SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-05-23 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r879901024 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -210,8 +252,23 @@ private AppShufflePartitionInfo getOrCr

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [WIP][SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-05-23 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r879899830 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -655,6 +776,238 @@ public void registerExecutor(String app

[GitHub] [spark] dtenedor commented on pull request #36623: [WIP][SPARK-39245][SQL] Support Avro scans with DEFAULT values

2022-05-23 Thread GitBox
dtenedor commented on PR #36623: URL: https://github.com/apache/spark/pull/36623#issuecomment-1135171752 It appears Spark does not allow ALTER ADD COLUMN DML with Avro tables. Closing this for now. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] dtenedor closed pull request #36623: [WIP][SPARK-39245][SQL] Support Avro scans with DEFAULT values

2022-05-23 Thread GitBox
dtenedor closed pull request #36623: [WIP][SPARK-39245][SQL] Support Avro scans with DEFAULT values URL: https://github.com/apache/spark/pull/36623 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] amaliujia opened a new pull request, #36641: [SPARK-39263] Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace

2022-05-23 Thread GitBox
amaliujia opened a new pull request, #36641: URL: https://github.com/apache/spark/pull/36641 ### What changes were proposed in this pull request? Make GetTable, TableExists and DatabaseExists be compatible with 3 layer namespace ### Why are the changes needed?

[GitHub] [spark] akpatnam25 commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is se

2022-05-23 Thread GitBox
akpatnam25 commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r879883905 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2449,7 +2459,12 @@ private[spark] class DAGScheduler( val currentEpoch = maybeEpoch.

[GitHub] [spark] dongjoon-hyun commented on pull request #36638: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader`

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36638: URL: https://github.com/apache/spark/pull/36638#issuecomment-1135155332 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] dongjoon-hyun closed pull request #36638: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader`

2022-05-23 Thread GitBox
dongjoon-hyun closed pull request #36638: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader` URL: https://github.com/apache/spark/pull/36638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] dongjoon-hyun commented on pull request #36637: [SPARK-39258][TESTS] Fix `Hide credentials in show create table`

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36637: URL: https://github.com/apache/spark/pull/36637#issuecomment-1135154660 Merged to master/3.3/3.2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] dongjoon-hyun closed pull request #36637: [SPARK-39258][TESTS] Fix `Hide credentials in show create table`

2022-05-23 Thread GitBox
dongjoon-hyun closed pull request #36637: [SPARK-39258][TESTS] Fix `Hide credentials in show create table` URL: https://github.com/apache/spark/pull/36637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] dongjoon-hyun commented on pull request #36637: [SPARK-39258][TESTS] Fix `Hide credentials in show create table`

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36637: URL: https://github.com/apache/spark/pull/36637#issuecomment-1135153418 PySpark GitHub Action job is still running over 5h 40minute. It's irrelevant to this test PR. I'll merge this PR. https://user-images.githubusercontent.com/9700541/169908543-d

[GitHub] [spark] dongjoon-hyun commented on pull request #36638: [SPARK-39260][SQL] Use `Reader.getSchema` instead of `Reader.getTypes` in `SparkOrcNewRecordReader`

2022-05-23 Thread GitBox
dongjoon-hyun commented on PR #36638: URL: https://github.com/apache/spark/pull/36638#issuecomment-1135152042 The PySpark GitHub Action jobs are still running exceptionally, but they are not related to this ORC patch. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] vli-databricks closed pull request #36527: [SPARK-39169][SQL] Optimize FIRST when used as a single aggregate fun…

2022-05-23 Thread GitBox
vli-databricks closed pull request #36527: [SPARK-39169][SQL] Optimize FIRST when used as a single aggregate fun… URL: https://github.com/apache/spark/pull/36527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-05-23 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r879816903 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -125,6 +135,31 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

[GitHub] [spark] otterc commented on a diff in pull request #36601: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-05-23 Thread GitBox
otterc commented on code in PR #36601: URL: https://github.com/apache/spark/pull/36601#discussion_r879790477 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2449,7 +2459,12 @@ private[spark] class DAGScheduler( val currentEpoch = maybeEpoch.getO

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36640: [SPARK-39262][PYTHON] Correct error messages when creating DataFrame from an RDD with the first element `0`

2022-05-23 Thread GitBox
xinrong-databricks commented on code in PR #36640: URL: https://github.com/apache/spark/pull/36640#discussion_r879786294 ## python/pyspark/sql/session.py: ## @@ -611,7 +611,7 @@ def _inferSchema( :class:`pyspark.sql.types.StructType` """ first = rdd.fi

  1   2   >