[GitHub] [spark] Yikun edited a comment on pull request #34983: [SPARK-37713][K8S] Assign namespace to executor configmap

2022-01-17 Thread GitBox
Yikun edited a comment on pull request #34983: URL: https://github.com/apache/spark/pull/34983#issuecomment-1014248788 FYI, this PR breaks the case of using ConfigMap with namespace specified (driver side), see https://github.com/apache/spark/pull/35215 . -- This is an automated message

[GitHub] [spark] Yaohua628 commented on a change in pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-17 Thread GitBox
Yaohua628 commented on a change in pull request #35068: URL: https://github.com/apache/spark/pull/35068#discussion_r785746252 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java ## @@ -0,0 +1,264 @@ +/* + * Licensed to the

[GitHub] [spark] AngersZhuuuu commented on pull request #35206: [SPARK-37906][SQL] spark-sql should not pass last comment to backend

2022-01-17 Thread GitBox
AngersZh commented on pull request #35206: URL: https://github.com/apache/spark/pull/35206#issuecomment-1014279764 How about current? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request #35226: [SPARK-37928][SQL][TESTS] Add Parquet Data Page V2 test scenario to `DataSourceReadBenchmark`

2022-01-17 Thread GitBox
LuciferYang opened a new pull request #35226: URL: https://github.com/apache/spark/pull/35226 ### What changes were proposed in this pull request? This PR adds a corresponding `Parquet Data Page V2` test scenario for each `Parquet Data Page V1` test scenario to

[GitHub] [spark] Peng-Lei commented on a change in pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
Peng-Lei commented on a change in pull request #35204: URL: https://github.com/apache/spark/pull/35204#discussion_r785729941 ## File path: sql/core/src/test/resources/sql-tests/results/show-create-table.sql.out ## @@ -257,7 +257,7 @@ SHOW CREATE TABLE tbl -- !query schema

[GitHub] [spark] LucaCanali commented on a change in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Pandas UDFs using SQL metrics

2022-01-17 Thread GitBox
LucaCanali commented on a change in pull request #33559: URL: https://github.com/apache/spark/pull/33559#discussion_r785743298 ## File path: docs/web-ui.md ## @@ -406,6 +406,8 @@ Here is the list of SQL metrics: time to build hash map the time spent on building hash map

[GitHub] [spark] dchvn commented on a change in pull request #34212: [SPARK-36402][PYTHON] Implement Series.combine

2022-01-17 Thread GitBox
dchvn commented on a change in pull request #34212: URL: https://github.com/apache/spark/pull/34212#discussion_r785753384 ## File path: python/pyspark/pandas/series.py ## @@ -4485,6 +4489,173 @@ def replace( return self._with_new_scol(current) # TODO: dtype? +

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #35207: [SPARK-37907][SQL] InvokeLike support ConstantFolding

2022-01-17 Thread GitBox
AngersZh commented on a change in pull request #35207: URL: https://github.com/apache/spark/pull/35207#discussion_r785752602 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -50,6 +50,8 @@ trait InvokeLike

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785761473 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785765189 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Avg.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the

[GitHub] [spark] LantaoJin commented on pull request #35168: [SPARK-37865][SQL]Spark should not dedup the grouping Expressions when the first child of union has duplicate columns

2022-01-17 Thread GitBox
LantaoJin commented on pull request #35168: URL: https://github.com/apache/spark/pull/35168#issuecomment-1014297154 Ok, I tested the first SQL in bin/spark-sql. There is no problem in 2.3/3.0/3.2. But it can be reproduced in bin/spark-shell, whatever in 2.3/3.0/3.0. And the second SQL can

[GitHub] [spark] Yikun commented on a change in pull request #35215: [SPARK-37916][K8S] The ConfigMap is assigned to incorrect namespace

2022-01-17 Thread GitBox
Yikun commented on a change in pull request #35215: URL: https://github.com/apache/spark/pull/35215#discussion_r785718346 ## File path: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala ## @@ -76,13

[GitHub] [spark] cloud-fan commented on a change in pull request #35213: [SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35213: URL: https://github.com/apache/spark/pull/35213#discussion_r785718580 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ## @@ -46,12 +46,11 @@ import

[GitHub] [spark] Peng-Lei commented on a change in pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
Peng-Lei commented on a change in pull request #35204: URL: https://github.com/apache/spark/pull/35204#discussion_r785723741 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCreateTableExec.scala ## @@ -98,11 +126,11 @@ case class

[GitHub] [spark] dchvn commented on pull request #35200: [SPARK-37903][PYTHON] Replace string_typehints with get_type_hints

2022-01-17 Thread GitBox
dchvn commented on pull request #35200: URL: https://github.com/apache/spark/pull/35200#issuecomment-1014263771 Before this PR ```python >>> from pyspark.pandas.typedef.typehints import infer_return_type >>> infer_return_type(max) Traceback (most recent call last): File

[GitHub] [spark] dchvn edited a comment on pull request #35200: [SPARK-37903][PYTHON] Replace string_typehints with get_type_hints

2022-01-17 Thread GitBox
dchvn edited a comment on pull request #35200: URL: https://github.com/apache/spark/pull/35200#issuecomment-1014263771 Before this PR ```python >>> from pyspark.pandas.typedef.typehints import infer_return_type >>> infer_return_type(max) Traceback (most recent call last):

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785764169 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] Yaohua628 commented on a change in pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-17 Thread GitBox
Yaohua628 commented on a change in pull request #35068: URL: https://github.com/apache/spark/pull/35068#discussion_r785764124 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java ## @@ -0,0 +1,264 @@ +/* + * Licensed to the

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785764681 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -210,16 +250,33 @@ object

[GitHub] [spark] Yaohua628 commented on a change in pull request #35068: [SPARK-37896][SQL] Implement a ConstantColumnVector and improve performance of the hidden file metadata

2022-01-17 Thread GitBox
Yaohua628 commented on a change in pull request #35068: URL: https://github.com/apache/spark/pull/35068#discussion_r785740125 ## File path: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java ## @@ -0,0 +1,264 @@ +/* + * Licensed to the

[GitHub] [spark] LantaoJin commented on pull request #35168: [SPARK-37865][SQL]Spark should not dedup the grouping Expressions when the first child of union has duplicate columns

2022-01-17 Thread GitBox
LantaoJin commented on pull request #35168: URL: https://github.com/apache/spark/pull/35168#issuecomment-1014283974 @chasingegg ``` select a, a from values (1, 1), (1, 2) as t1(a, b) UNION ALL SELECT c, d from values (2, 3), (2, 3) as t2(c, d) result is

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785758951 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] Yikun commented on pull request #35215: [SPARK-37916][K8S] The ConfigMap is assigned to incorrect namespace

2022-01-17 Thread GitBox
Yikun commented on pull request #35215: URL: https://github.com/apache/spark/pull/35215#issuecomment-1014246663 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on a change in pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35204: URL: https://github.com/apache/spark/pull/35204#discussion_r785724438 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCreateTableExec.scala ## @@ -57,7 +60,7 @@ case class

[GitHub] [spark] cloud-fan commented on a change in pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35204: URL: https://github.com/apache/spark/pull/35204#discussion_r785724513 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowCreateTableExec.scala ## @@ -71,8 +74,9 @@ case class

[GitHub] [spark] cloud-fan closed pull request #35208: [SPARK-37904][SQL] Improve RebalancePartitions in rules of Optimizer

2022-01-17 Thread GitBox
cloud-fan closed pull request #35208: URL: https://github.com/apache/spark/pull/35208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [spark] cloud-fan commented on pull request #35208: [SPARK-37904][SQL] Improve RebalancePartitions in rules of Optimizer

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35208: URL: https://github.com/apache/spark/pull/35208#issuecomment-1014254560 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] LuciferYang commented on pull request #35226: [SPARK-37928][SQL][TESTS] Add Parquet Data Page V2 test scenario to `DataSourceReadBenchmark`

2022-01-17 Thread GitBox
LuciferYang commented on pull request #35226: URL: https://github.com/apache/spark/pull/35226#issuecomment-1014269827 cc @dongjoon-hyun @sunchao @viirya I give this pr to adds a corresponding `Parquet Data Page V2` test scenario for each `Parquet Data Page V1` test scenario to

[GitHub] [spark] cloud-fan commented on a change in pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35204: URL: https://github.com/apache/spark/pull/35204#discussion_r785725545 ## File path: sql/core/src/test/resources/sql-tests/results/show-create-table.sql.out ## @@ -257,7 +257,7 @@ SHOW CREATE TABLE tbl -- !query schema

[GitHub] [spark] Yikun commented on a change in pull request #35203: [SPARK-37886][PYTHON][TESTS] Refactor on OpsTestCase and use ComparisonTestBase

2022-01-17 Thread GitBox
Yikun commented on a change in pull request #35203: URL: https://github.com/apache/spark/pull/35203#discussion_r785734656 ## File path: python/pyspark/pandas/tests/data_type_ops/testing_utils.py ## @@ -41,7 +43,7 @@ from pandas import BooleanDtype, StringDtype -class

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785762233 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35213: [SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35213: URL: https://github.com/apache/spark/pull/35213#discussion_r785718131 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala ## @@ -366,6 +366,8 @@ trait RuntimeReplaceable

[GitHub] [spark] Yikun commented on pull request #34983: [SPARK-37713][K8S] Assign namespace to executor configmap

2022-01-17 Thread GitBox
Yikun commented on pull request #34983: URL: https://github.com/apache/spark/pull/34983#issuecomment-1014248788 FYI, this PR breaks the basic case to use ConfigMap with namespace specified (driver side), see https://github.com/apache/spark/pull/35215 . -- This is an automated message

[GitHub] [spark] LucaCanali commented on a change in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Pandas UDFs using SQL metrics

2022-01-17 Thread GitBox
LucaCanali commented on a change in pull request #33559: URL: https://github.com/apache/spark/pull/33559#discussion_r785745165 ## File path: docs/web-ui.md ## @@ -406,6 +406,8 @@ Here is the list of SQL metrics: time to build hash map the time spent on building hash map

[GitHub] [spark] dchvn commented on a change in pull request #34212: [SPARK-36402][PYTHON] Implement Series.combine

2022-01-17 Thread GitBox
dchvn commented on a change in pull request #34212: URL: https://github.com/apache/spark/pull/34212#discussion_r785745008 ## File path: python/pyspark/pandas/series.py ## @@ -4485,6 +4489,173 @@ def replace( return self._with_new_scol(current) # TODO: dtype? +

[GitHub] [spark] LantaoJin edited a comment on pull request #35168: [SPARK-37865][SQL]Spark should not dedup the grouping Expressions when the first child of union has duplicate columns

2022-01-17 Thread GitBox
LantaoJin edited a comment on pull request #35168: URL: https://github.com/apache/spark/pull/35168#issuecomment-1014283974 @chasingegg ``` select a, a from values (1, 1), (1, 2) as t1(a, b) UNION ALL SELECT c, d from values (2, 3), (2, 3) as t2(c, d)

[GitHub] [spark] LucaCanali commented on a change in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Pandas UDFs using SQL metrics

2022-01-17 Thread GitBox
LucaCanali commented on a change in pull request #33559: URL: https://github.com/apache/spark/pull/33559#discussion_r785768730 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala ## @@ -42,7 +43,10 @@ class ArrowPythonRunner(

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785767123 ## File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala ## @@ -874,4 +876,47 @@ class JDBCV2Suite extends QueryTest with

[GitHub] [spark] LantaoJin edited a comment on pull request #35168: [SPARK-37865][SQL]Spark should not dedup the grouping Expressions when the first child of union has duplicate columns

2022-01-17 Thread GitBox
LantaoJin edited a comment on pull request #35168: URL: https://github.com/apache/spark/pull/35168#issuecomment-1014297154 Ok, I tested the first SQL in bin/spark-sql. There is no problem in 2.0/2.3/3.0/3.2. But it can be reproduced in bin/spark-shell, whatever in 2.0/2.3/3.0/3.0. And the

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35228: [SPARK-37498][PYTHON] Add eventually for test_reuse_worker_of_parallelize_range

2022-01-17 Thread GitBox
HyukjinKwon commented on a change in pull request #35228: URL: https://github.com/apache/spark/pull/35228#discussion_r785842136 ## File path: python/pyspark/tests/test_worker.py ## @@ -191,8 +191,14 @@ def test_reuse_worker_of_parallelize_range(self): rdd =

[GitHub] [spark] AngersZhuuuu opened a new pull request #35229: [SPARK-27442][SQL] Remove check filename when reading data

2022-01-17 Thread GitBox
AngersZh opened a new pull request #35229: URL: https://github.com/apache/spark/pull/35229 ### What changes were proposed in this pull request? It's OK for Spark to forbid special chars in the column name, but when we read existing parquet files, there is no point to forbid it at

[GitHub] [spark] wangyum commented on a change in pull request #35214: [SPARK-37915][SQL] Push down deterministic projection through SQL UNION and combine them

2022-01-17 Thread GitBox
wangyum commented on a change in pull request #35214: URL: https://github.com/apache/spark/pull/35214#discussion_r785770644 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -766,18 +767,24 @@ object

[GitHub] [spark] Yikun opened a new pull request #35228: [SPARK-37498][PYTHON] Add eventually for test_reuse_worker_of_parallelize_range

2022-01-17 Thread GitBox
Yikun opened a new pull request #35228: URL: https://github.com/apache/spark/pull/35228 ### What changes were proposed in this pull request? Add eventually for test_reuse_worker_of_parallelize_range ### Why are the changes needed? Avoid test_reuse_worker_of_parallelize_range

[GitHub] [spark] yoda-mon commented on pull request #34896: [SPARK-37568][SQL] Support 2-arguments by the convert_timezone() function

2022-01-17 Thread GitBox
yoda-mon commented on pull request #34896: URL: https://github.com/apache/spark/pull/34896#issuecomment-1014354761 @MaxGekk Gentle reminder, - Extending `TimeZoneAwareExpression` seems not so simple. - `sourceTz` is not ignored so I have to set appropriate timezone there. -

[GitHub] [spark] beliefer commented on a change in pull request #35213: [SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction`

2022-01-17 Thread GitBox
beliefer commented on a change in pull request #35213: URL: https://github.com/apache/spark/pull/35213#discussion_r785817488 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/finishAnalysis.scala ## @@ -46,12 +46,11 @@ import

[GitHub] [spark] Peng-Lei opened a new pull request #35227: [SPARK-37931][SQL] Quote the column name if neededQuote the column name if needed

2022-01-17 Thread GitBox
Peng-Lei opened a new pull request #35227: URL: https://github.com/apache/spark/pull/35227 ### What changes were proposed in this pull request? Quote the column name just needed instead of anyway. ### Why are the changes needed?

[GitHub] [spark] pan3793 commented on pull request #35223: [SPARK-37925][DOC] Update document to mention the workaround for YARN-11053

2022-01-17 Thread GitBox
pan3793 commented on pull request #35223: URL: https://github.com/apache/spark/pull/35223#issuecomment-1014377705 @itholic updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Peng-Lei commented on pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
Peng-Lei commented on pull request #35204: URL: https://github.com/apache/spark/pull/35204#issuecomment-1014424632 @cloud-fan Update the PR. As the testcase failed

[GitHub] [spark] Yikun commented on a change in pull request #35228: [SPARK-37498][PYTHON] Add eventually for test_reuse_worker_of_parallelize_range

2022-01-17 Thread GitBox
Yikun commented on a change in pull request #35228: URL: https://github.com/apache/spark/pull/35228#discussion_r785906409 ## File path: python/pyspark/tests/test_worker.py ## @@ -191,8 +191,14 @@ def test_reuse_worker_of_parallelize_range(self): rdd =

[GitHub] [spark] dnskr commented on pull request #35224: [SPARK-32165][SQL] Ensure Spark only initiates SharedState once across SparkSessions

2022-01-17 Thread GitBox
dnskr commented on pull request #35224: URL: https://github.com/apache/spark/pull/35224#issuecomment-1014482302 @cloud-fan It is old memory leak and originally it was mentioned in the [PR#28128](https://github.com/apache/spark/pull/28128) and

[GitHub] [spark] dchvn commented on a change in pull request #34212: [SPARK-36402][PYTHON] Implement Series.combine

2022-01-17 Thread GitBox
dchvn commented on a change in pull request #34212: URL: https://github.com/apache/spark/pull/34212#discussion_r785786499 ## File path: python/pyspark/pandas/series.py ## @@ -4485,6 +4489,173 @@ def replace( return self._with_new_scol(current) # TODO: dtype? +

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785767123 ## File path: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala ## @@ -874,4 +876,47 @@ class JDBCV2Suite extends QueryTest with

[GitHub] [spark] beliefer commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
beliefer commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785779225 ## File path: sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/aggregate/Avg.java ## @@ -0,0 +1,49 @@ +/* + * Licensed to the

[GitHub] [spark] ulysses-you commented on pull request #35208: [SPARK-37904][SQL] Improve RebalancePartitions in rules of Optimizer

2022-01-17 Thread GitBox
ulysses-you commented on pull request #35208: URL: https://github.com/apache/spark/pull/35208#issuecomment-1014332697 thank you all ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] beliefer commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
beliefer commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r785814514 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] NobiGo commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
NobiGo commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014426755 In the Postgresql: ``` CREATE TABLE decimalTable(value numeric(7,2)); INSERT INTO decimalTable(value) VALUES (12.34); INSERT INTO decimalTable(value) VALUES (12.45);

[GitHub] [spark] AmplabJenkins commented on pull request #35224: [SPARK-32165][SQL] Ensure Spark only initiates SharedState once across SparkSessions

2022-01-17 Thread GitBox
AmplabJenkins commented on pull request #35224: URL: https://github.com/apache/spark/pull/35224#issuecomment-1014440333 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] AmplabJenkins commented on pull request #35223: [SPARK-37925][DOC] Update document to mention the workaround for YARN-11053

2022-01-17 Thread GitBox
AmplabJenkins commented on pull request #35223: URL: https://github.com/apache/spark/pull/35223#issuecomment-1014440378 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan commented on a change in pull request #35147: [SPARK-37768][SQL][FOLLOWUP] Schema pruning for the metadata struct

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35147: URL: https://github.com/apache/spark/pull/35147#discussion_r786003631 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala ## @@ -179,12 +187,15 @@ object SchemaPruning

[GitHub] [spark] stczwd commented on pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

2022-01-17 Thread GitBox
stczwd commented on pull request #35185: URL: https://github.com/apache/spark/pull/35185#issuecomment-1014537402 > Took an initial pass through the PR and added some comments - overall looks good. We would need to make sure that skew join and partition coalescing in SQL interact well with

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r786033834 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r786047569 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35204: URL: https://github.com/apache/spark/pull/35204#issuecomment-1014611798 do you know why the test failed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] this opened a new pull request #35230: [SPARK-37934] [Build] Upgrade Jetty version to 9.4.44

2022-01-17 Thread GitBox
this opened a new pull request #35230: URL: https://github.com/apache/spark/pull/35230 ### What changes were proposed in this pull request? This PR upgrades Jetty version to `9.4.44.v20210927`. ### Why are the changes needed? We would like to have the fix for

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786103581 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConstants.scala ## @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache

[GitHub] [spark] Peng-Lei edited a comment on pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
Peng-Lei edited a comment on pull request #35204: URL: https://github.com/apache/spark/pull/35204#issuecomment-1014664463 > do you know why the test failed? without `filterNot(_ == PROP_EXTERNAL)` -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786120191 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] cloud-fan commented on pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35204: URL: https://github.com/apache/spark/pull/35204#issuecomment-1014687461 @Peng-Lei can you be more specific? What's the expectation of the tests and how do we break it? I see that we exclude `COMMENT` there, as we do want to allow users to

[GitHub] [spark] cloud-fan commented on a change in pull request #35147: [SPARK-37768][SQL][FOLLOWUP] Schema pruning for the metadata struct

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35147: URL: https://github.com/apache/spark/pull/35147#discussion_r786000323 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala ## @@ -158,18 +169,15 @@ object SchemaPruning

[GitHub] [spark] cloud-fan commented on pull request #35224: [SPARK-32165][SQL] Ensure Spark only initiates SharedState once across SparkSessions

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35224: URL: https://github.com/apache/spark/pull/35224#issuecomment-1014575419 Then can you explain more about how the memory leak happens? such as the object references path and the GC root? The added listeners can be GCed after

[GitHub] [spark] NobiGo commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
NobiGo commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014611159 @cloud-fan Okay, I got it. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #34848: [SPARK-37582][SPARK-37583][SQL] CONTAINS, STARTSWITH, ENDSWITH should support all data type

2022-01-17 Thread GitBox
AngersZh commented on a change in pull request #34848: URL: https://github.com/apache/spark/pull/34848#discussion_r786081325 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala ## @@ -205,6 +206,7 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786105657 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConstants.scala ## @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786115746 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786115746 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786118941 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] wangyum commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
wangyum commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014518831 @NobiGo We do not simplify that case. This is the test:

[GitHub] [spark] stczwd commented on a change in pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

2022-01-17 Thread GitBox
stczwd commented on a change in pull request #35185: URL: https://github.com/apache/spark/pull/35185#discussion_r785993842 ## File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala ## @@ -286,6 +289,7 @@ private[spark] class TaskDataWrapper( taskId,

[GitHub] [spark] stczwd commented on a change in pull request #35185: [SPARK-37831][CORE] add task partition id in TaskInfo and Task Metrics

2022-01-17 Thread GitBox
stczwd commented on a change in pull request #35185: URL: https://github.com/apache/spark/pull/35185#discussion_r785993842 ## File path: core/src/main/scala/org/apache/spark/status/storeTypes.scala ## @@ -286,6 +289,7 @@ private[spark] class TaskDataWrapper( taskId,

[GitHub] [spark] Yaohua628 commented on a change in pull request #35147: [SPARK-37768][SQL][FOLLOWUP] Schema pruning for the metadata struct

2022-01-17 Thread GitBox
Yaohua628 commented on a change in pull request #35147: URL: https://github.com/apache/spark/pull/35147#discussion_r786021381 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaPruning.scala ## @@ -179,12 +187,15 @@ object SchemaPruning

[GitHub] [spark] cloud-fan commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014594887 @NobiGo that's why this PR only combines "upcast" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] NobiGo commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
NobiGo commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014604727 @cloud-fan I don't understand. For example: ``` select cast(cast(1234 as decimal(3,0)) as decimal(4,0)) from dual will throws exception select cast(1234 as

[GitHub] [spark] cloud-fan commented on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
cloud-fan commented on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014609396 `cast(1234 as decimal(3,0))` this is not a "upcast", which should be 100% safe. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786108113 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/NumberConstants.scala ## @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786117500 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] cloud-fan commented on a change in pull request #35206: [SPARK-37906][SQL] spark-sql should not pass last comment to backend

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35206: URL: https://github.com/apache/spark/pull/35206#discussion_r786123159 ## File path: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala ## @@ -613,7 +613,17 @@

[GitHub] [spark] cloud-fan commented on a change in pull request #35213: [SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35213: URL: https://github.com/apache/spark/pull/35213#discussion_r786032039 ## File path: sql/core/src/test/resources/sql-tests/results/udf/udf-group-by.sql.out ## @@ -377,43 +377,39 @@ struct -- !query SELECT every(udf(1))

[GitHub] [spark] cloud-fan commented on a change in pull request #35213: [SPARK-37914][SQL] Make `RuntimeReplaceable` works for `AggregateFunction`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35213: URL: https://github.com/apache/spark/pull/35213#discussion_r786031606 ## File path: sql/core/src/test/resources/sql-functions/sql-expression-schema.md ## @@ -336,17 +336,17 @@ |

[GitHub] [spark] itholic commented on pull request #34386: [WIP] - Changes to PySpark doc homepage and User Guide

2022-01-17 Thread GitBox
itholic commented on pull request #34386: URL: https://github.com/apache/spark/pull/34386#issuecomment-1014595416 What is the status? Is it still in progress ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] cloud-fan edited a comment on pull request #35220: [SPARK-37922][SQL] Improve `SimplifyCasts` to remove useless cast

2022-01-17 Thread GitBox
cloud-fan edited a comment on pull request #35220: URL: https://github.com/apache/spark/pull/35220#issuecomment-1014594887 @NobiGo that's why this PR only combines "upcast". @wangyum I think we should call it out explicitly in the PR description. `remove useless cast` is too vague. --

[GitHub] [spark] Peng-Lei commented on pull request #35204: [SPARK-37878][SQL] Migrate SHOW CREATE TABLE to use v2 command by default

2022-01-17 Thread GitBox
Peng-Lei commented on pull request #35204: URL: https://github.com/apache/spark/pull/35204#issuecomment-1014664463 > do you know why the test failed? without ·filterNot(_ == PROP_EXTERNAL)· -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786121415 ## File path: sql/core/src/test/resources/sql-tests/inputs/string-functions.sql ## @@ -124,4 +124,76 @@ SELECT endswith('Spark SQL', 'QL'); SELECT

[GitHub] [spark] LuciferYang commented on pull request #35229: [SPARK-27442][SQL] Remove check filename when reading data

2022-01-17 Thread GitBox
LuciferYang commented on pull request #35229: URL: https://github.com/apache/spark/pull/35229#issuecomment-1014688751 > Remove check filename when reading data filename? fieldname? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r786036810 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35130: [SPARK-37839][SQL] DS V2 supports partial aggregate push-down `AVG`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35130: URL: https://github.com/apache/spark/pull/35130#discussion_r786046107 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala ## @@ -88,25 +88,65 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35214: [SPARK-37915][SQL] Push down deterministic projection through SQL UNION and combine them

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35214: URL: https://github.com/apache/spark/pull/35214#discussion_r786071848 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -78,7 +78,6 @@ abstract class

[GitHub] [spark] cloud-fan commented on a change in pull request #35060: [SPARK-28137][SQL] Data Type Formatting Functions: `to_number`

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35060: URL: https://github.com/apache/spark/pull/35060#discussion_r786120642 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala ## @@ -888,6 +889,179 @@ class

[GitHub] [spark] AngersZhuuuu commented on pull request #35229: [SPARK-27442][SQL] Remove check filename when reading data

2022-01-17 Thread GitBox
AngersZh commented on pull request #35229: URL: https://github.com/apache/spark/pull/35229#issuecomment-1014710877 > > Remove check filename when reading data > > filename? fieldname? Yea.. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cdegroc commented on a change in pull request #35139: [SPARK-37829][SQL] DataFrame.joinWith should return null rows for missing values

2022-01-17 Thread GitBox
cdegroc commented on a change in pull request #35139: URL: https://github.com/apache/spark/pull/35139#discussion_r786166674 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala ## @@ -110,23 +110,28 @@ object

[GitHub] [spark] cloud-fan commented on a change in pull request #35139: [SPARK-37829][SQL] DataFrame.joinWith should return null rows for missing values

2022-01-17 Thread GitBox
cloud-fan commented on a change in pull request #35139: URL: https://github.com/apache/spark/pull/35139#discussion_r786183789 ## File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ## @@ -1173,8 +1173,20 @@ class Dataset[T] private[sql]( joined =

  1   2   3   >