[Impala-ASF-CR] IMPALA-12426: Query History Table

Jason Fehr (Code Review) Tue, 05 Mar 2024 14:14:25 -0800

Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20770 )


Change subject: IMPALA-12426: Query History Table
......................................................................


Patch Set 31:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc
File be/src/service/workload-management.cc:

http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@171
PS31, Line 171: SHUTDOWN_REQUESTED
> nit: SHUTTING_DOWN
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@218
PS31, Line 218:   for (auto iter = sql.cbegin(); iter != sql.cend(); iter++) {
> Could be a range-based for loop taking 'const auto&' (always a good default
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@393
PS31, Line 393:     for (auto iter = ctx.record->per_host_state.cbegin();
> Could be a range-based for loop taking 'const auto&'.
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@575
PS31, Line 575:   auto min_elem = 
max_element(ctx.record->per_host_state.cbegin(),
> Should be max_elem.
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@589
PS31, Line 589:   if(LIKELY(!ctx.record->per_host_state.empty())) {
> nit: space after 'if'
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@659
PS31, Line 659: fields_.push_back(MakeTuple(
> Lines 659-677 should be indented.
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@687
PS31, Line 687:   for (auto iter = fields_.cbegin(); iter != fields_.cend(); 
iter++) {
> Could be a range-based for loop (const auto&).
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@726
PS31, Line 726:   for (auto iter = fields_.cbegin(); iter != fields_.cend(); 
iter++) {
> Range-based for loop.
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@769
PS31, Line 769: completed_queries_thread_state_ == RUNNING
> Can ShutdownWorkloadManagement called before completed_queries_thread_state
It's possible but *shouldn't* happen.  Still, I made modifications to ensure 
state transitions happen correctly if the Impala coordinator is shut down 
during startup.  This if condition is now  <= instead of ==.


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@841
PS31, Line 841: completed_queries_thread_state_
> What if the previous completed_queries_thread_state_ is SHUTDOWN_REQUESTED
I added a check that ensures a shutdown was not initiated while the previous 
code was running.


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@895
PS31, Line 895:       for (auto iter = queries_to_insert.begin(); iter != 
queries_to_insert.end();
> Could be a range-based for loop taking a ref (to update the count).
In this case, directly using iterators is more efficient since std::list::erase 
has O(1) complexity while std::list::remove and std::list::remove_if have O(n) 
complexity.

Using a range-based for loop would require using std::list::remove or 
std::list::remove_if to remove items from the queries_to_insert list.


http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@953
PS31, Line 953:   for (auto iter = fields_.cbegin(); iter != fields_.cend(); 
iter++) {
> Range-based for loop.
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py
File tests/beeswax/impala_beeswax.py:

http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@185
PS31, Line 185: fetch_profile_after_close=False
> fetch_profile_after_close=True seems specific to TestQueryLogTable tests on
For now, this parameter is only used for the TestQueryLogTable tests.  It could 
theoretically be used by other tests if they need data from the profile such as 
completed time and duration.

Before this change, the profile is retrieved before the query is closed.  This 
ensures the profile is available as there is a small chance the profile could 
disappear after the query is closed and before it can be retrieved again.

Setting this parameter to True causes a second retrieve of the query profile 
after the query closes.  I like have it as a function parameter here for a 
couple reasons:
1. Setting a client field member is difficult because the tests have a 
BeeswaxConnection object which itself has an instance of ImpalaBeeswaxClient.  
It requires significantly more code to set a client field member.  I actually 
took this approach at first.
2. The majority of the queries run by the tests do not need to fetch the 
profile after close.  It's only one query per test at most.  Thus, having a 
client field member will cause extra unnecessary work.


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@198
PS31, Line 198: runtime_profile = self.get_runtime_profile(handle)
> Skip pulling profile if !fetch_profile_after_close
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@219
PS31, Line 219: result.runtime_profile = self.get_runtime_profile(handle)
> Skip pulling profile if !fetch_profile_after_close
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/common/custom_cluster_test_suite.py
File tests/common/custom_cluster_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/20770/31/tests/common/custom_cluster_test_suite.py@244
PS31, Line 244: method=None
> this is unused?
Deleted.  It was left over from an approach that was abandoned.


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py
File tests/custom_cluster/test_query_log.py:

http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py@421
PS31, Line 421: assert_byte_str(expected_str=row_mat_str, 
actual_bytes=data[index],
              :           msg="row materialization rate incorrect", factor=1000)
              :
> nit: maybe it is worth to create separate function like "assert_byte_rate_s
The "factor" came from me misunderstanding the units for the row 
materialization rate. I saw units as "K/sec" and "M/sec" and read those as 
"kilobytes per second" and "megabytes per second".  After further review, the 
units are actually "thousands of rows per second" and "millions of rows per 
second".

Now that my understanding is correct, the "assert_byte_str" function is the 
wrong function to use.  I have switched this assertion to be different and also 
updated the column from "ROW_MATERIALIZATION_BYTES_PER_SEC" to 
"ROW_MATERIALIZATION_ROWS_PER_SEC".

I have also removed the "factor" parameter from "assert_byte_str" and used a 
hardcoded value of "1024" instead.


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py@1221
PS31, Line 1221: # Allow time for Impala to process the insert.
               :     sleep(3)
> I think this sleep can be replaced with execution of REFRESH query.
Added refresh sqls.  Will run another round of tests to verify these tests 
still pass.


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/assert_time.py
File tests/util/assert_time.py:

http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/assert_time.py@27
PS31, Line 27: "{0} -- expected: {1}, actual {2}, calculated {3}" \
             :       .format(msg, expected_str, actual_time_ns, 
total_nanoseconds)
> Print the tolerance value in assert message (total_nanoseconds * tolerance)
Done


http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/memory.py
File tests/util/memory.py:

http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/memory.py@27
PS31, Line 27: "{0} -- expected: {1}, " \
             :       "actual: {2}, calculated: {3}".format(msg, expected_str, 
actual_bytes, calc)
> Print the tolerance value in assert message (calc * 0.005).
Done



--
To view, visit http://gerrit.cloudera.org:8080/20770
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844
Gerrit-Change-Number: 20770
Gerrit-PatchSet: 31
Gerrit-Owner: Jason Fehr <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Tue, 05 Mar 2024 22:14:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-12426: Query History Table

Reply via email to