[GitHub] [spark] pralabhkumar commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
pralabhkumar commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888632502 ## python/pyspark/tests/test_shuffle.py: ## @@ -54,6 +63,49 @@ def test_medium_dataset(self): self.assertTrue(m.spills >= 1) self.assertEqual(sum(

[GitHub] [spark] mridulm commented on a diff in pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set t

2022-06-02 Thread GitBox
mridulm commented on code in PR #36734: URL: https://github.com/apache/spark/pull/36734#discussion_r888654676 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1885,6 +1885,16 @@ private[spark] class DAGScheduler( mapOutputTracker.

[GitHub] [spark] MaxGekk closed pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

2022-06-02 Thread GitBox
MaxGekk closed pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept URL: https://github.com/apache/spark/pull/36708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #36708: [SPARK-37623][SQL] Support ANSI Aggregate Function: regr_intercept

2022-06-02 Thread GitBox
MaxGekk commented on PR #36708: URL: https://github.com/apache/spark/pull/36708#issuecomment-1145630828 +1, LGTM. Merging to master. Thank you, @beliefer and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] MaxGekk closed pull request #36752: [SPARK-39259][SQL][3.3] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk closed pull request #36752: [SPARK-39259][SQL][3.3] Evaluate timestamps consistently in subqueries URL: https://github.com/apache/spark/pull/36752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] MaxGekk commented on pull request #36752: [SPARK-39259][SQL][3.3] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk commented on PR #36752: URL: https://github.com/apache/spark/pull/36752#issuecomment-1145620433 +1, LGTM. Merging to 3.3. Thank you, @olaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] xuanyuanking opened a new pull request, #36757: [SPARK-39371][DOCS][Core] Review and fix issues in Scala/Java API docs of Core module #36754

2022-06-02 Thread GitBox
xuanyuanking opened a new pull request, #36757: URL: https://github.com/apache/spark/pull/36757 ### What changes were proposed in this pull request? Compare the 3.3.0 API doc with the latest release version 3.2.1. Fix the following issues: * Add missing Since annotation

[GitHub] [spark] pralabhkumar commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
pralabhkumar commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888632502 ## python/pyspark/tests/test_shuffle.py: ## @@ -54,6 +63,49 @@ def test_medium_dataset(self): self.assertTrue(m.spills >= 1) self.assertEqual(sum(

[GitHub] [spark] pralabhkumar commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
pralabhkumar commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888632502 ## python/pyspark/tests/test_shuffle.py: ## @@ -54,6 +63,49 @@ def test_medium_dataset(self): self.assertTrue(m.spills >= 1) self.assertEqual(sum(

[GitHub] [spark] pralabhkumar commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
pralabhkumar commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888632502 ## python/pyspark/tests/test_shuffle.py: ## @@ -54,6 +63,49 @@ def test_medium_dataset(self): self.assertTrue(m.spills >= 1) self.assertEqual(sum(

[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

2022-06-02 Thread GitBox
sadikovi commented on code in PR #36726: URL: https://github.com/apache/spark/pull/36726#discussion_r887638039 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {

[GitHub] [spark] MaxGekk closed pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-02 Thread GitBox
MaxGekk closed pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN` URL: https://github.com/apache/spark/pull/36714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] sadikovi commented on a diff in pull request #36726: [SPARK-39339][SQL] Support TimestampNTZ type in JDBC data source

2022-06-02 Thread GitBox
sadikovi commented on code in PR #36726: URL: https://github.com/apache/spark/pull/36726#discussion_r887638039 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -150,6 +150,9 @@ object JdbcUtils extends Logging with SQLConfHelper {

[GitHub] [spark] AmplabJenkins commented on pull request #36740: [SPARK-39355][SQL] Avoid UnresolvedAttribute.apply throwing ParseException

2022-06-02 Thread GitBox
AmplabJenkins commented on PR #36740: URL: https://github.com/apache/spark/pull/36740#issuecomment-1145590291 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36741: [SPARK-39357][SQL] Fix pmCache memory leak caused by IsolatedClassLoader

2022-06-02 Thread GitBox
AmplabJenkins commented on PR #36741: URL: https://github.com/apache/spark/pull/36741#issuecomment-1145590268 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
AmplabJenkins commented on PR #36745: URL: https://github.com/apache/spark/pull/36745#issuecomment-1145590247 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] sadikovi commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
sadikovi commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r888606358 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala: ## @@ -41,21 +41,28 @@ import org.apache.spark.sql.types.{MetadataBuilder

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36660: [SPARK-39284][PS] Implement Groupby.mad

2022-06-02 Thread GitBox
HyukjinKwon commented on code in PR #36660: URL: https://github.com/apache/spark/pull/36660#discussion_r888603854 ## python/pyspark/pandas/groupby.py: ## @@ -759,6 +759,99 @@ def skew(scol: Column) -> Column: bool_to_numeric=True, ) +# TODO: 'axis', '

[GitHub] [spark] sadikovi commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
sadikovi commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r888603145 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala: ## @@ -122,7 +122,7 @@ abstract class SessionCatalogSuite extends Analys

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36660: [SPARK-39284][PS] Implement Groupby.mad

2022-06-02 Thread GitBox
HyukjinKwon commented on code in PR #36660: URL: https://github.com/apache/spark/pull/36660#discussion_r888603267 ## python/pyspark/pandas/groupby.py: ## @@ -805,7 +874,7 @@ def all(self, skipna: bool = True) -> FrameLike: 5 False """ groupkey_names =

[GitHub] [spark] wangyum commented on pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
wangyum commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145534550 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] wangyum closed pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
wangyum closed pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports URL: https://github.com/apache/spark/pull/36750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] HyukjinKwon closed pull request #36736: [SPARK-39351][SQL] SHOW CREATE TABLE should redact properties

2022-06-02 Thread GitBox
HyukjinKwon closed pull request #36736: [SPARK-39351][SQL] SHOW CREATE TABLE should redact properties URL: https://github.com/apache/spark/pull/36736 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #36736: [SPARK-39351][SQL] SHOW CREATE TABLE should redact properties

2022-06-02 Thread GitBox
HyukjinKwon commented on PR #36736: URL: https://github.com/apache/spark/pull/36736#issuecomment-1145530299 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon opened a new pull request, #36756: [SPARK-39369][INFRA] Increase the memory for building from 4096 to 5120MB in AppVeyor

2022-06-02 Thread GitBox
HyukjinKwon opened a new pull request, #36756: URL: https://github.com/apache/spark/pull/36756 ### What changes were proposed in this pull request? https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/builds/43740704 AppVeyor build is being failed because of the lack

[GitHub] [spark] wangyum commented on a diff in pull request #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-02 Thread GitBox
wangyum commented on code in PR #36755: URL: https://github.com/apache/spark/pull/36755#discussion_r888564235 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -288,7 +288,13 @@ object InjectRuntimeFilter extends Rule[Logical

[GitHub] [spark] AngersZhuuuu commented on pull request #36736: [SPARK-39351][SQL] SHOW CREATE TABLE should redact properties

2022-06-02 Thread GitBox
AngersZh commented on PR #36736: URL: https://github.com/apache/spark/pull/36736#issuecomment-1145528214 ping @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
AngersZh commented on code in PR #36754: URL: https://github.com/apache/spark/pull/36754#discussion_r888563938 ## sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java: ## @@ -44,10 +44,14 @@ * 4. In Hive's code, the method [[merge()] pass a serializ

[GitHub] [spark] sigmod commented on a diff in pull request #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-02 Thread GitBox
sigmod commented on code in PR #36755: URL: https://github.com/apache/spark/pull/36755#discussion_r888560970 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -288,7 +288,13 @@ object InjectRuntimeFilter extends Rule[LogicalP

[GitHub] [spark] HyukjinKwon commented on pull request #36683: [SPARK-39301][SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization

2022-06-02 Thread GitBox
HyukjinKwon commented on PR #36683: URL: https://github.com/apache/spark/pull/36683#issuecomment-1145517326 Gentle ping for a review :-). I know it has some trade-off but I believe this addresses more common cases and benefit more users. -- This is an automated message from the Apache Git

[GitHub] [spark] AmplabJenkins commented on pull request #36752: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
AmplabJenkins commented on PR #36752: URL: https://github.com/apache/spark/pull/36752#issuecomment-1145517157 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36753: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
AmplabJenkins commented on PR #36753: URL: https://github.com/apache/spark/pull/36753#issuecomment-1145517135 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] beliefer commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-02 Thread GitBox
beliefer commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r888554738 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -359,6 +359,32 @@ case class Percentile( ) } +// scala

[GitHub] [spark] wangyum commented on pull request #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-02 Thread GitBox
wangyum commented on PR #36755: URL: https://github.com/apache/spark/pull/36755#issuecomment-1145503284 cc @sigmod @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] wangyum opened a new pull request, #36755: [SPARK-39368][SQL] Move `RewritePredicateSubquery` into `InjectRuntimeFilter`

2022-06-02 Thread GitBox
wangyum opened a new pull request, #36755: URL: https://github.com/apache/spark/pull/36755 ### What changes were proposed in this pull request? This PR moves `RewritePredicateSubquery` into `InjectRuntimeFilter`. ### Why are the changes needed? Reduce the number of `Rewri

[GitHub] [spark] dongjoon-hyun commented on pull request #36697: [SPARK-39313][SQL] `toCatalystOrdering` should fail if V2Expression can not be translated

2022-06-02 Thread GitBox
dongjoon-hyun commented on PR #36697: URL: https://github.com/apache/spark/pull/36697#issuecomment-1145480808 Thank you, @pan3793 , @sunchao , @cloud-fan ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
HyukjinKwon commented on PR #36701: URL: https://github.com/apache/spark/pull/36701#issuecomment-114542 LGTM otherwise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
HyukjinKwon commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888519926 ## python/pyspark/tests/test_shuffle.py: ## @@ -117,6 +169,37 @@ def legit_merge_combiners(x, y): m.mergeCombiners(map(lambda x_y1: (x_y1[0], [x_y1[1]])

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36701: [SPARK-39179][PYTHON][TESTS] Improve the test coverage for pyspark/shuffle.py

2022-06-02 Thread GitBox
HyukjinKwon commented on code in PR #36701: URL: https://github.com/apache/spark/pull/36701#discussion_r888519237 ## python/pyspark/tests/test_shuffle.py: ## @@ -54,6 +63,49 @@ def test_medium_dataset(self): self.assertTrue(m.spills >= 1) self.assertEqual(sum(s

[GitHub] [spark] HyukjinKwon closed pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
HyukjinKwon closed pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module URL: https://github.com/apache/spark/pull/36754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] HyukjinKwon commented on pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
HyukjinKwon commented on PR #36754: URL: https://github.com/apache/spark/pull/36754#issuecomment-1145475718 Merged to master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] sunchao commented on pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
sunchao commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145471603 > Lastly, could you make the PR description up-to-date? For example, the following seems to need some changes. > > > This PR removes the check so that the command works as long as t

[GitHub] [spark] github-actions[bot] closed pull request #35329: [SPARK-33326][SQL] Update Partition statistic parameters after ANALYZE TABLE ... PARTITION()

2022-06-02 Thread GitBox
github-actions[bot] closed pull request #35329: [SPARK-33326][SQL] Update Partition statistic parameters after ANALYZE TABLE ... PARTITION() URL: https://github.com/apache/spark/pull/35329 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888508050 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -355,14 +355,17 @@ private[hive] class HiveClientImpl( } override d

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36693: [SPARK-39349] Add a centralized CheckError method for QA of error path

2022-06-02 Thread GitBox
HyukjinKwon commented on code in PR #36693: URL: https://github.com/apache/spark/pull/36693#discussion_r888508847 ## sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala: ## @@ -54,10 +72,34 @@ class AnalysisException protected[sql] ( messageParameters

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888508050 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -355,14 +355,17 @@ private[hive] class HiveClientImpl( } override d

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888508050 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -355,14 +355,17 @@ private[hive] class HiveClientImpl( } override d

[GitHub] [spark] wangyum commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
wangyum commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888507803 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -355,14 +355,17 @@ private[hive] class HiveClientImpl( } override def alt

[GitHub] [spark] viirya commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
viirya commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888506874 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala: ## @@ -355,14 +355,17 @@ private[hive] class HiveClientImpl( } override def alte

[GitHub] [spark] viirya commented on a diff in pull request #36750: [SPARK-29260][SQL] Support `ALTER DATABASE SET LOCATION` if HMS supports

2022-06-02 Thread GitBox
viirya commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888506364 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -1628,8 +1628,8 @@ object QueryCompilationErrors extends QueryErrorsBase {

[GitHub] [spark] dtenedor commented on pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
dtenedor commented on PR #36745: URL: https://github.com/apache/spark/pull/36745#issuecomment-1145458556 @sadikovi thanks for your review, these are helpful ideas! Please look again when you have time. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] dtenedor commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
dtenedor commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r888505739 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala: ## @@ -427,6 +428,7 @@ class SessionCatalog( tableDefinition.copy(iden

[GitHub] [spark] dtenedor commented on a diff in pull request #36745: [SPARK-39359][SQL] Restrict DEFAULT columns to allowlist of supported data source types

2022-06-02 Thread GitBox
dtenedor commented on code in PR #36745: URL: https://github.com/apache/spark/pull/36745#discussion_r888505435 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -231,4 +232,18 @@ object ResolveDefaultColumns { }

[GitHub] [spark] dongjoon-hyun commented on pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145456039 Lastly, could you make the PR description up-to-date? For example, the following? > This PR removes the check so that the command works as long as the Hive version used by the HM

[GitHub] [spark] sunchao commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
sunchao commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888490510 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Seq[Stri

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888477684 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Se

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888482892 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Se

[GitHub] [spark] sunchao commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
sunchao commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888482028 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Seq[Stri

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888478943 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Se

[GitHub] [spark] sunchao commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
sunchao commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888477757 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Seq[Stri

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888477684 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Se

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on code in PR #36750: URL: https://github.com/apache/spark/pull/36750#discussion_r888476932 ## sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala: ## @@ -165,19 +165,19 @@ class HiveClientSuite(version: String, allVersions: Se

[GitHub] [spark] sunchao commented on pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
sunchao commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145425002 The `ALTER DATABASE SET LOCATION` command will change the default location for new tables created afterwards. So in step 2) above, if table location is not explicitly specified, the new t

[GitHub] [spark] sunchao commented on pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
sunchao commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145407388 Fixed. @viirya pls take another look, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] holdenk commented on pull request #36434: [SPARK-38969][K8S] Fix Decom reporting

2022-06-02 Thread GitBox
holdenk commented on PR #36434: URL: https://github.com/apache/spark/pull/36434#issuecomment-1145371331 Update: with the change for increased resilence it passes integration tests on my machine. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-02 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r888383953 ## sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala: ## @@ -299,15 +313,18 @@ class CatalogSuite extends SharedSparkSession with AnalysisTest

[GitHub] [spark] amaliujia commented on a diff in pull request #36586: [SPARK-39236][SQL] Make CreateTable and ListTables be compatible with 3 layer namespace

2022-06-02 Thread GitBox
amaliujia commented on code in PR #36586: URL: https://github.com/apache/spark/pull/36586#discussion_r888374320 ## sql/core/src/main/scala/org/apache/spark/sql/catalog/interface.scala: ## @@ -64,12 +64,26 @@ class Database( @Stable class Table( val name: String, -@Nul

[GitHub] [spark] JoshRosen commented on pull request #36751: [WIP][SPARK-39336][CORE] Do not release write locks on task end.

2022-06-02 Thread GitBox
JoshRosen commented on PR #36751: URL: https://github.com/apache/spark/pull/36751#issuecomment-1145291954 If I recall, I think the original motivation for this "release all locks at the end of the task" code was to prevent indefinite "pin leaks" if tasks fail to properly release locks (e.g.

[GitHub] [spark] gengliangwang commented on a diff in pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
gengliangwang commented on code in PR #36754: URL: https://github.com/apache/spark/pull/36754#discussion_r888368841 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -46,7 +46,7 @@ import org.apache.spark.sql.types._ * As commands a

[GitHub] [spark] gengliangwang commented on a diff in pull request #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
gengliangwang commented on code in PR #36754: URL: https://github.com/apache/spark/pull/36754#discussion_r888303216 ## sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java: ## @@ -44,10 +44,14 @@ * 4. In Hive's code, the method [[merge()] pass a seriali

[GitHub] [spark] gengliangwang opened a new pull request, #36754: [SPARK-39367][DOCS][SQL] Review and fix issues in Scala/Java API docs of SQL module

2022-06-02 Thread GitBox
gengliangwang opened a new pull request, #36754: URL: https://github.com/apache/spark/pull/36754 ### What changes were proposed in this pull request? Compare the 3.3.0 API doc with the latest release version 3.2.1. Fix the following issues: * Add missing Since annotatio

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888299878 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -655,6 +744,156 @@ public void registerExecutor(String app

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888299709 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -655,6 +744,156 @@ public void registerExecutor(String app

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888299473 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -655,6 +744,156 @@ public void registerExecutor(String app

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888299188 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -992,6 +1233,45 @@ AppShufflePartitionInfo getPartitionInf

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888298796 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -576,6 +661,7 @@ public MergeStatuses finalizeShuffleMerg

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888298391 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -576,6 +661,7 @@ public MergeStatuses finalizeShuffleMerg

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888298248 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -209,9 +246,16 @@ private AppShufflePartitionInfo getOrCr

[GitHub] [spark] zhouyejoe commented on pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on PR #35906: URL: https://github.com/apache/spark/pull/35906#issuecomment-1145216328 > Added a flag in closeAndDeletePartitionFilesIfNeeded to check whether DB cleanup is needed or not. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] MaxGekk commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-02 Thread GitBox
MaxGekk commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r888289663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -359,6 +359,32 @@ case class Percentile( ) } +// scalas

[GitHub] [spark] MaxGekk commented on a diff in pull request #36714: [SPARK-39320][SQL] Support aggregate function `MEDIAN`

2022-06-02 Thread GitBox
MaxGekk commented on code in PR #36714: URL: https://github.com/apache/spark/pull/36714#discussion_r888289663 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -359,6 +359,32 @@ case class Percentile( ) } +// scalas

[GitHub] [spark] olaky opened a new pull request, #36753: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
olaky opened a new pull request, #36753: URL: https://github.com/apache/spark/pull/36753 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

[GitHub] [spark] olaky opened a new pull request, #36752: [SPARK-39259] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
olaky opened a new pull request, #36752: URL: https://github.com/apache/spark/pull/36752 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was t

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888286186 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -536,9 +619,11 @@ public MergeStatuses finalizeShuffleMer

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888284976 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -342,6 +389,29 @@ void closeAndDeletePartitionFilesIfNeede

[GitHub] [spark] dtenedor commented on pull request #36672: [SPARK-39265][SQL] Support vectorized Parquet scans with DEFAULT values

2022-06-02 Thread GitBox
dtenedor commented on PR #36672: URL: https://github.com/apache/spark/pull/36672#issuecomment-1145200667 @HyukjinKwon the CI passes now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] MaxGekk closed pull request #36749: [SPARK-39295][DOCS][PYTHON][3.3] Improve documentation of pandas API supported list

2022-06-02 Thread GitBox
MaxGekk closed pull request #36749: [SPARK-39295][DOCS][PYTHON][3.3] Improve documentation of pandas API supported list URL: https://github.com/apache/spark/pull/36749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] MaxGekk commented on pull request #36749: [SPARK-39295][DOCS][PYTHON][3.3] Improve documentation of pandas API supported list

2022-06-02 Thread GitBox
MaxGekk commented on PR #36749: URL: https://github.com/apache/spark/pull/36749#issuecomment-1145199197 +1, LGTM. Merging to 3.3. Thank you, @beobest2 and @HyukjinKwon @Yikun for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [spark] MaxGekk commented on pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk commented on PR #36654: URL: https://github.com/apache/spark/pull/36654#issuecomment-1145197543 @olaky Could you open a separate PRs with backports to branch-3.3 and branch-3.2 (according to SPARK-39259, 3.2 has this issue). Congratulations with the first contribution to Apach

[GitHub] [spark] MaxGekk closed pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk closed pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries URL: https://github.com/apache/spark/pull/36654 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk commented on PR #36654: URL: https://github.com/apache/spark/pull/36654#issuecomment-1145192594 +1, LGTM. Merging to master, 3.3, 3.2. Thank you, @olaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] MaxGekk commented on pull request #36654: [SPARK-39259][SQL] Evaluate timestamps consistently in subqueries

2022-06-02 Thread GitBox
MaxGekk commented on PR #36654: URL: https://github.com/apache/spark/pull/36654#issuecomment-1145192596 +1, LGTM. Merging to master, 3.3, 3.2. Thank you, @olaky. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] akpatnam25 commented on a diff in pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is se

2022-06-02 Thread GitBox
akpatnam25 commented on code in PR #36734: URL: https://github.com/apache/spark/pull/36734#discussion_r888273505 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4402,12 +4501,20 @@ object DAGSchedulerSuite { def makeMapStatus(host: String, re

[GitHub] [spark] viirya commented on pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
viirya commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145188518 `AlterNamespaceSetLocationSuite` seems failed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] attilapiros commented on a diff in pull request #36512: [SPARK-39152][CORE] Deregistering disk persisted local blocks in case of IO related errors

2022-06-02 Thread GitBox
attilapiros commented on code in PR #36512: URL: https://github.com/apache/spark/pull/36512#discussion_r888265229 ## core/src/main/scala/org/apache/spark/storage/BlockManager.scala: ## @@ -933,46 +935,56 @@ private[spark] class BlockManager( }) Some(new Blo

[GitHub] [spark] otterc commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-06-02 Thread GitBox
otterc commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r888251830 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -342,6 +389,29 @@ void closeAndDeletePartitionFilesIfNeeded(

[GitHub] [spark] otterc commented on a diff in pull request #36734: [SPARK-38987][SHUFFLE] Throw FetchFailedException when merged shuffle blocks are corrupted and spark.shuffle.detectCorrupt is set to

2022-06-02 Thread GitBox
otterc commented on code in PR #36734: URL: https://github.com/apache/spark/pull/36734#discussion_r888231614 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4402,12 +4501,20 @@ object DAGSchedulerSuite { def makeMapStatus(host: String, reduce

[GitHub] [spark] hvanhovell commented on pull request #36751: [WIP][SPARK-39336][CORE] Do not release write locks on task end.

2022-06-02 Thread GitBox
hvanhovell commented on PR #36751: URL: https://github.com/apache/spark/pull/36751#issuecomment-1145135863 This is still a WIP. If we think this is the right thing to do, then I will add some tests. -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [spark] hvanhovell opened a new pull request, #36751: [WIP][SPARK-39336][CORE] Do not release write locks on task end.

2022-06-02 Thread GitBox
hvanhovell opened a new pull request, #36751: URL: https://github.com/apache/spark/pull/36751 ### What changes were proposed in this pull request? This PR removes the unlocking of write locks on task end from the `BlockInfoManager`. ### Why are the changes needed? The `BlockInfo

[GitHub] [spark] dongjoon-hyun commented on pull request #36750: [SPARK-29260][SQL] Support alter database location for Hive client versions other than 3.0/3.1

2022-06-02 Thread GitBox
dongjoon-hyun commented on PR #36750: URL: https://github.com/apache/spark/pull/36750#issuecomment-1145110352 Thank you for pinging me, @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

  1   2   >