Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/20770 )
Change subject: IMPALA-12426: Query History Table ...................................................................... Patch Set 31: (20 comments) http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc File be/src/service/workload-management.cc: http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@171 PS31, Line 171: SHUTDOWN_REQUESTED > nit: SHUTTING_DOWN Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@218 PS31, Line 218: for (auto iter = sql.cbegin(); iter != sql.cend(); iter++) { > Could be a range-based for loop taking 'const auto&' (always a good default Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@393 PS31, Line 393: for (auto iter = ctx.record->per_host_state.cbegin(); > Could be a range-based for loop taking 'const auto&'. Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@575 PS31, Line 575: auto min_elem = max_element(ctx.record->per_host_state.cbegin(), > Should be max_elem. Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@589 PS31, Line 589: if(LIKELY(!ctx.record->per_host_state.empty())) { > nit: space after 'if' Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@659 PS31, Line 659: fields_.push_back(MakeTuple( > Lines 659-677 should be indented. Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@687 PS31, Line 687: for (auto iter = fields_.cbegin(); iter != fields_.cend(); iter++) { > Could be a range-based for loop (const auto&). Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@726 PS31, Line 726: for (auto iter = fields_.cbegin(); iter != fields_.cend(); iter++) { > Range-based for loop. Done http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@769 PS31, Line 769: completed_queries_thread_state_ == RUNNING > Can ShutdownWorkloadManagement called before completed_queries_thread_state It's possible but *shouldn't* happen. Still, I made modifications to ensure state transitions happen correctly if the Impala coordinator is shut down during startup. This if condition is now <= instead of ==. http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@841 PS31, Line 841: completed_queries_thread_state_ > What if the previous completed_queries_thread_state_ is SHUTDOWN_REQUESTED I added a check that ensures a shutdown was not initiated while the previous code was running. http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@895 PS31, Line 895: for (auto iter = queries_to_insert.begin(); iter != queries_to_insert.end(); > Could be a range-based for loop taking a ref (to update the count). In this case, directly using iterators is more efficient since std::list::erase has O(1) complexity while std::list::remove and std::list::remove_if have O(n) complexity. Using a range-based for loop would require using std::list::remove or std::list::remove_if to remove items from the queries_to_insert list. http://gerrit.cloudera.org:8080/#/c/20770/31/be/src/service/workload-management.cc@953 PS31, Line 953: for (auto iter = fields_.cbegin(); iter != fields_.cend(); iter++) { > Range-based for loop. Done http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py File tests/beeswax/impala_beeswax.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@185 PS31, Line 185: fetch_profile_after_close=False > fetch_profile_after_close=True seems specific to TestQueryLogTable tests on For now, this parameter is only used for the TestQueryLogTable tests. It could theoretically be used by other tests if they need data from the profile such as completed time and duration. Before this change, the profile is retrieved before the query is closed. This ensures the profile is available as there is a small chance the profile could disappear after the query is closed and before it can be retrieved again. Setting this parameter to True causes a second retrieve of the query profile after the query closes. I like have it as a function parameter here for a couple reasons: 1. Setting a client field member is difficult because the tests have a BeeswaxConnection object which itself has an instance of ImpalaBeeswaxClient. It requires significantly more code to set a client field member. I actually took this approach at first. 2. The majority of the queries run by the tests do not need to fetch the profile after close. It's only one query per test at most. Thus, having a client field member will cause extra unnecessary work. http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@198 PS31, Line 198: runtime_profile = self.get_runtime_profile(handle) > Skip pulling profile if !fetch_profile_after_close Done http://gerrit.cloudera.org:8080/#/c/20770/31/tests/beeswax/impala_beeswax.py@219 PS31, Line 219: result.runtime_profile = self.get_runtime_profile(handle) > Skip pulling profile if !fetch_profile_after_close Done http://gerrit.cloudera.org:8080/#/c/20770/31/tests/common/custom_cluster_test_suite.py File tests/common/custom_cluster_test_suite.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/common/custom_cluster_test_suite.py@244 PS31, Line 244: method=None > this is unused? Deleted. It was left over from an approach that was abandoned. http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py File tests/custom_cluster/test_query_log.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py@421 PS31, Line 421: assert_byte_str(expected_str=row_mat_str, actual_bytes=data[index], : msg="row materialization rate incorrect", factor=1000) : > nit: maybe it is worth to create separate function like "assert_byte_rate_s The "factor" came from me misunderstanding the units for the row materialization rate. I saw units as "K/sec" and "M/sec" and read those as "kilobytes per second" and "megabytes per second". After further review, the units are actually "thousands of rows per second" and "millions of rows per second". Now that my understanding is correct, the "assert_byte_str" function is the wrong function to use. I have switched this assertion to be different and also updated the column from "ROW_MATERIALIZATION_BYTES_PER_SEC" to "ROW_MATERIALIZATION_ROWS_PER_SEC". I have also removed the "factor" parameter from "assert_byte_str" and used a hardcoded value of "1024" instead. http://gerrit.cloudera.org:8080/#/c/20770/31/tests/custom_cluster/test_query_log.py@1221 PS31, Line 1221: # Allow time for Impala to process the insert. : sleep(3) > I think this sleep can be replaced with execution of REFRESH query. Added refresh sqls. Will run another round of tests to verify these tests still pass. http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/assert_time.py File tests/util/assert_time.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/assert_time.py@27 PS31, Line 27: "{0} -- expected: {1}, actual {2}, calculated {3}" \ : .format(msg, expected_str, actual_time_ns, total_nanoseconds) > Print the tolerance value in assert message (total_nanoseconds * tolerance) Done http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/memory.py File tests/util/memory.py: http://gerrit.cloudera.org:8080/#/c/20770/31/tests/util/memory.py@27 PS31, Line 27: "{0} -- expected: {1}, " \ : "actual: {2}, calculated: {3}".format(msg, expected_str, actual_bytes, calc) > Print the tolerance value in assert message (calc * 0.005). Done -- To view, visit http://gerrit.cloudera.org:8080/20770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2d2da9d450fba4e789400cfa62927fc25d34f844 Gerrit-Change-Number: 20770 Gerrit-PatchSet: 31 Gerrit-Owner: Jason Fehr <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Tue, 05 Mar 2024 22:14:14 +0000 Gerrit-HasComments: Yes
