Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22231 )

Change subject: IMPALA-13620: Refresh compute_table_stats.py script
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh
File testdata/bin/compute-table-stats.sh:

http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@58
PS8, Line 58:     
--table_names="store_sales,store_returns,web_sales,web_returns,
Nit: this list of table names is repeated, could be moved into a variable to 
eliminate duplication.

This suggestion is applicable below as well.


http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@63
PS8, Line 63: # Partitioned tables are computed separately above.
Nit: At first glance, I missed that the table names are now in the exclude list 
in this command.  Please consider adding to the comment wording such as 
"Compute stats in parallel on all remaining tables in the 
tpcds_partitioned_parquet_snap db."

This suggestion is applicable below as well.


http://gerrit.cloudera.org:8080/#/c/22231/8/testdata/bin/compute-table-stats.sh@92
PS8, Line 92:     
--table_names="alltypesagg,alltypesaggmultifilesnopart,alltypesaggnonulls,
Nit:  Consider matching the previous patterns where the second compute stats on 
the same db uses the `--exclude_table_names` parameter.


http://gerrit.cloudera.org:8080/#/c/22231/8/tests/util/compute_table_stats.py
File tests/util/compute_table_stats.py:

http://gerrit.cloudera.org:8080/#/c/22231/8/tests/util/compute_table_stats.py@34
PS8, Line 34: DEFAULT_PARALLELISM = min(24, multiprocessing.cpu_count())
My dev machine has 16 CPUs.  If it's critical to match the degree of 
parallelism to the cpu count, then we need to use a different method of 
dynamically obtaining the cpu count at script runtime.  Another option would be 
to use the IMPALA_BUILD_THREADS environment variable.



--
To view, visit http://gerrit.cloudera.org:8080/22231
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1ebf02f95b957e7dda3a30622b87e8fca3197699
Gerrit-Change-Number: 22231
Gerrit-PatchSet: 8
Gerrit-Owner: Riza Suminto <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 31 Dec 2024 19:28:12 +0000
Gerrit-HasComments: Yes

Reply via email to