[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 03 Dec 2024 03:03:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provides a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the script. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn if there is any unnecessary table stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Reviewed-on: http://gerrit.cloudera.org:8080/22045
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17613/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:44:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 11:

Thank you Michael and Quanlong!


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:37:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 10: Code-Review+2

LGTM. Carry Michael's +1. Thanks for this work to save our future efforts in 
changing the golden files!


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:36:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:37:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/11186/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 11
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:37:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/9/bin/jenkins/critique-gerrit-review.py
File bin/jenkins/critique-gerrit-review.py:

http://gerrit.cloudera.org:8080/#/c/22045/9/bin/jenkins/critique-gerrit-review.py@381
PS9, Line 381: comments[path].append(get_comment_input(UNCOMMITTED_CHANGE))
> At L397 we will discard all changes under testdata/ dir. I think we should
Good point. Done.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 10
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:26:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#10).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provides a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the script. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn if there is any unnecessary table stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17612/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:25:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/22045/9/bin/jenkins/critique-gerrit-review.py
File bin/jenkins/critique-gerrit-review.py:

http://gerrit.cloudera.org:8080/#/c/22045/9/bin/jenkins/critique-gerrit-review.py@381
PS9, Line 381: if os.path.splitext(path)[1] == ".test":
At L397 we will discard all changes under testdata/ dir. I think we should 
remove this condition to protect changes on other files like txt files under 
testdata/data/


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py
File testdata/bin/restore-stats-on-planner-tests.py:

http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@37
PS8, Line 37: \d.*
> Added requirement such that all tables subject to this script must have sta
Ack


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@128
PS8, Line 128:
> Limit to do replacement only if all partitions are selected.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:10:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22045/8/bin/jenkins/critique-gerrit-review.py
File bin/jenkins/critique-gerrit-review.py:

http://gerrit.cloudera.org:8080/#/c/22045/8/bin/jenkins/critique-gerrit-review.py@382
PS8, Line 382:   
comments[path].append(get_comment_input(UNCOMMITTED_CHANGE))
> This discards all changes. To be safe, can we run "git diff" before this me
Done


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py
File testdata/bin/restore-stats-on-planner-tests.py:

http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@33
PS8, Line 33: \w+).(
> Should we accept numbers in the names as well? Maybe use "\w"?
Done


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@37
PS8, Line 37: \d.*
> Do we have cases where this is not positive number but something like "unav
Added requirement such that all tables subject to this script must have stats 
collected by default. Thus, there should be no "unavailable" or "-1" case.


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@128
PS8, Line 128:
> In cases where not all the partitions are selected, is it correct to use th
Limit to do replacement only if all partitions are selected.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 21:00:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#9).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provides a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the script. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn if there is any unnecessary table stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-12-02 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22045/8/bin/jenkins/critique-gerrit-review.py
File bin/jenkins/critique-gerrit-review.py:

http://gerrit.cloudera.org:8080/#/c/22045/8/bin/jenkins/critique-gerrit-review.py@382
PS8, Line 382:   check_call(['git', 'checkout', '.'])
This discards all changes. To be safe, can we run "git diff" before this method 
and abort if there are already some uncommitted changes?


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py
File testdata/bin/restore-stats-on-planner-tests.py:

http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@33
PS8, Line 33: [_a-z]
Should we accept numbers in the names as well? Maybe use "\w"?


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@37
PS8, Line 37: \d.*
Do we have cases where this is not positive number but something like 
"unavailable" or "-1"?


http://gerrit.cloudera.org:8080/#/c/22045/8/testdata/bin/restore-stats-on-planner-tests.py@128
PS8, Line 128: stats[1]
In cases where not all the partitions are selected, is it correct to use the 
table level size?



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 02 Dec 2024 19:03:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-25 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Mon, 25 Nov 2024 18:29:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-25 Thread Michael Smith (Code Review)
Michael Smith has uploaded a new patch set (#8) to the change originally 
created by Riza Suminto. ( http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provides a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the script. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn if there is any unnecessary table stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17572/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 23 Nov 2024 19:01:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-23 Thread Riza Suminto (Code Review)
Hello Quanlong Huang, Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#7).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the sript. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run with python3, trigger restore-stats-on-planner-tests.py,
and warn is there is any unnecessary table stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test:

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test@1217
PS6, Line 1217:HDFS partitions=1824/1824 files=1824 size=200.96MB
> gerrit-auto-critic always runs the script from impala#master, not the test
I see.. got it.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 23 Nov 2024 00:38:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17565/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 23 Nov 2024 00:24:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test:

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test@1217
PS6, Line 1217:HDFS partitions=1824/1824 files=1824 size=200.96MB
> Not sure why gerrit-auto-critic does not complain about this before. Jenkin
gerrit-auto-critic always runs the script from impala#master, not the test 
branch.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 23 Nov 2024 00:06:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test:

http://gerrit.cloudera.org:8080/#/c/22045/6/testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test@1217
PS6, Line 1217:HDFS partitions=1824/1824 files=1824 size=200.96MB
Not sure why gerrit-auto-critic does not complain about this before. Jenkins 
also persistently running it with python2 even though I already change the 
shebang to python3.
https://jenkins.impala.io/job/gerrit-auto-critic/18475/console
https://jenkins.impala.io/job/gerrit-auto-critic/18474/console

But I run it in my local machine and it produce comments.

./bin/jenkins/critique-gerrit-review.py --dryrun
Requirement already satisfied: flake8==3.9.2 in 
./gerrit_critic_venv/lib/python3.8/site-packages (3.9.2)
Requirement already satisfied: flake8-diff==0.2.2 in 
./gerrit_critic_venv/lib/python3.8/site-packages (0.2.2)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8==3.9.2) (0.6.1)
Requirement already satisfied: pycodestyle<2.8.0,>=2.7.0 in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8==3.9.2) (2.7.0)
Requirement already satisfied: pyflakes<2.4.0,>=2.3.0 in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8==3.9.2) (2.3.1)
Requirement already satisfied: blessings in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8-diff==0.2.2) (1.7)
Requirement already satisfied: six in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8-diff==0.2.2) 
(1.16.0)
Requirement already satisfied: argparse in 
./gerrit_critic_venv/lib/python3.8/site-packages (from flake8-diff==0.2.2) 
(1.4.0)
Updated 1 path from the index
{
 "comments": {
  
"testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test":
 [
   {
"message": "Table numRows and totalSize is changed in this planner test. It 
is unnecessary to change them. Consider running 
testdata/bin/restore-stats-on-planner-tests.py script to restore them to 
consistent values.",
"line": 0,
"side": "REVISION"
   }
  ]
 },
 "message": "gerrit-auto-critic failed. You can reproduce it locally using 
command:\n\n  python2 bin/jenkins/critique-gerrit-review.py --dryrun\n\nTo run 
it, you might need a virtual env with virtualenv installed."
}



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 6
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Sat, 23 Nov 2024 00:01:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#6).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the sript. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run this script and warn is there is any unnecessary table
stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/Planne

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 5:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/17564/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 22:59:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 4:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/17563/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 4
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 22:39:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/5/bin/jenkins/critique-gerrit-review.py
File bin/jenkins/critique-gerrit-review.py:

http://gerrit.cloudera.org:8080/#/c/22045/5/bin/jenkins/critique-gerrit-review.py@1
PS5, Line 1: #!/usr/bin/python3
I think we have python3 everywhere we'd run this now.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 22:34:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#5).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the sript. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run this script and warn is there is any unnecessary table
stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/Planne

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#4).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script restore-stats-on-planner-tests.py to
fix the table stats information in selected .test files. The test files
to check and fixed table stats is declared inside the sript. It is
currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table. critique-gerrit-review.py is
updated to run this script and warn is there is any unnecessary table
stats change detected.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json. Before the
patch, the table stats printed in plan is the real stats from HMS. After
this patch, the table stats displayed is calculated from the
stats-3TB.json. See IMPALA-12726 for more detail on large scale planner
test simulation.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M bin/jenkins/critique-gerrit-review.py
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/Planne

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@16
PS3, Line 16: This patch provide a python script to fix the table stats 
information in
> Maybe we should have or update a README then.
I will try to integrate the script with ./bin/jenkins/critique-gerrit-review.py



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 20:22:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@16
PS3, Line 16: This patch provide a python script to fix the table stats 
information in
> It's not clear to me how this script should be used. Do you run it after an
I expect contributor to run ./tesdata/bin/restore-stats-on-planner-test.py if 
their patch changes any test listed in the script.
If nothing change after execution, all good. If the table size line change, 
that change should be included into patch as well. This is because that table 
size line is specific only to their test data in their local machine.


http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@21
PS3, Line 21: This patch also fixed table size for tests under
> Is this the only reason stats changed in the test files?
Yes, for all changes under tpcds_cpu_cost/.

Changes undet tpcds/ comes from executing restore-stats-on-planner-test.py
I plan for more tables and tests to check later:
https://gerrit.cloudera.org/c/22047/3/testdata/bin/restore-stats-on-planner-tests.py


http://gerrit.cloudera.org:8080/#/c/22045/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/22045/3/fe/src/main/java/org/apache/impala/catalog/Table.java@223
PS3, Line 223:   protected double testMetadataScale_ = -1.0;
> I'm not clear why this is changed. Is it called out in the commit message?
Not explicitly. I will update the commit message.

I change the default to -1 to make distinction whether this is in testing mode 
(>0) or not (-1).



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 19:00:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@16
PS3, Line 16: This patch provide a python script to fix the table stats 
information in
> I expect contributor to run ./tesdata/bin/restore-stats-on-planner-test.py
Maybe we should have or update a README then.


http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@21
PS3, Line 21: This patch also fixed table size for tests under
> Yes, for all changes under tpcds_cpu_cost/.
I guess I'm still not clear what the script is setting them to. I see in the 
script it has some hard-coded, so it'd be nice to call that out in the commit 
message.



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 22 Nov 2024 19:08:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-22 Thread Michael Smith (Code Review)
Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@16
PS3, Line 16: This patch provide a python script to fix the table stats 
information in
It's not clear to me how this script should be used. Do you run it after any 
changes, and it ensures stats don't change unless something about our 
estimation changes?


http://gerrit.cloudera.org:8080/#/c/22045/3//COMMIT_MSG@21
PS3, Line 21: This patch also fixed table size for tests under
Is this the only reason stats changed in the test files?


http://gerrit.cloudera.org:8080/#/c/22045/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/22045/3/fe/src/main/java/org/apache/impala/catalog/Table.java@223
PS3, Line 223:   protected double testMetadataScale_ = -1.0;
I'm not clear why this is changed. Is it called out in the commit message?



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Fri, 22 Nov 2024 18:51:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-13 Thread Riza Suminto (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#3).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script to fix the table stats information in
selected .test files. It currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17461/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 13 Nov 2024 22:17:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17448/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 2
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 13 Nov 2024 05:25:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-12 Thread Riza Suminto (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/22045

to look at the new patch set (#2).

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

Impala has several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script to fix the table stats information in
selected .test files. It currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q64.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional

[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/17420/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 08 Nov 2024 20:39:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22045 )

Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..


Patch Set 1:

(1 comment)

gerrit-auto-critic failed. You can reproduce it locally using command:

  python2 bin/jenkins/critique-gerrit-review.py --dryrun

To run it, you might need a virtual env with virtualenv installed.

http://gerrit.cloudera.org:8080/#/c/22045/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/22045/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@2062
PS1, Line 2062: bytesToDisplay = (long) ((double) 
partsPerFs.getValue() / table.getPartitions().size() * testTableSize);
line too long (116 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/22045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
Gerrit-Change-Number: 22045
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 08 Nov 2024 20:18:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13535: Add script to restore stats on PlannerTest

2024-11-08 Thread Riza Suminto (Code Review)
Riza Suminto has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/22045


Change subject: IMPALA-13535: Add script to restore stats on PlannerTest
..

IMPALA-13535: Add script to restore stats on PlannerTest

We have several PlannerTest that validate over EXTENDED profile and
validate cardinality. In EXTENDED level, profile display stored table
stats from HMS like 'numRows' and 'totalSize', which can vary between
data loads. They are not validated by PlannerTest. But frequent change
of these lines can disturb code review process because they are mostly
noise.

This patch provide a python script to fix the table stats information in
selected .test files. It currently focus on tests under
functional-planner/queries/PlannerTest/tpcds/ and some that test against
tpcds_partitioned_parquet_snap table.

This patch also fixed table size for tests under
functional-planner/queries/PlannerTest/tpcds_cpu_cost/ because all tests
there runs with synthetic stats declared in stats-3TB.json.

Testing:
- Manually run the script and confirm that stats line are replaced
  correctly.
- Run affected PlannerTest and all passed.

Change-Id: I27bab7cee93880cd59f01b9c2d1614dfcabdc682
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A testdata/bin/restore-stats-on-planner-tests.py
M 
testdata/workloads/functional-planner/queries/PlannerTest/agg-node-max-mem-estimate.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/processing-cost-plan-admission-slots.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds-processing-cost.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q04.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q05.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q09.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q11.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q54.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q71.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q74.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds/tpcds-q84.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-iceberg.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-ddl-parquet.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q01.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q02.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q03.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q04.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q05.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q06.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q07.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q08.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q09.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q10a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q11.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q12.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q13.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q14b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q15.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q16.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q17.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q18.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q19.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q20.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q21.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q22.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23a.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q23b.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/tpcds_cpu_cost/tpcds-q24a.tes