Jason Fehr has posted comments on this change. ( http://gerrit.cloudera.org:8080/20762 )
Change subject: IMPALA-12540: Query Live Table ...................................................................... Patch Set 41: (11 comments) http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h File be/src/exec/system-table-scan-node.h: http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h@40 PS41, Line 40: by fetching more data from metrics. Nit: this comment does not make sense in this context. Possibly a copy-and-paste from somewhere else? http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h@43 PS41, Line 43: NYI > Yeah, this pattern appears to be fairly common across PlanNode implementati nit: please add a comment stating this function always returns an error http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h File be/src/exec/system-table-scanner.h: http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h@42 PS41, Line 42: bool eos() const { return eos_; } nit: add noexcept? http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h File be/src/service/impala-server.h: http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@509 PS41, Line 509: size_t NumLiveQueries(); Please mark this function const noexcept. http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1138 PS41, Line 1138: /// Returns a list of completed queries that have not yet been written to storage. Please document that this function makes a copy of the completed queries queue and returns it so that callers know thery do not need to hold any locks before calling this function. http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1139 PS41, Line 1139: std::vector<std::shared_ptr<QueryStateExpanded>> GetCompletedQueries(); Please mark this function const noexcept. The completed_queries_lock_ will also need to be marked mutable. http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h File be/src/util/sharded-query-map-util.h: http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h@59 PS41, Line 59: size_t Count() { Can this function be marked const noexcept? http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift File common/thrift/SystemTables.thrift: http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift@21 PS41, Line 21: enum TQueryColumn { > nit: I think TSystemTableColumn is a better fit. Since this enum is for the active queries table which is a subclass of system table, the name TQueryColumn is more descriptive since there could be multiple subclasses of system table in the future. http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java File fe/src/main/java/org/apache/impala/catalog/SystemTable.java: http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@52 PS41, Line 52: private static Map<String, TSystemTableName> SYSTEM_TABLE_NAME_MAP = : ImmutableMap.of(QUERY_LIVE, TSystemTableName.QUERY_LIVE); > Consider using ImmutableMap. And assert map size is 1 http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java File fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java: http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@121 PS41, Line 121: 1024L * 1024L > Looks like 1MB is a standard MIN_MEMORY_ESTIMATE. Could also use table_.getNumRows() multiplied by some constant to get a better guess. I agree though that it's fine as-is and can be left unchanged. http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py File tests/custom_cluster/test_query_live.py: http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@36 PS41, Line 36: assert_query('sys.impala_query_live', self.client, 'test_query_live_1', > Is there any race where the data about the query can get written before we There is a thread that flushes the completed queries in-memory queue to the sys.impala_query_log table, and that thread starts off sleeping for the configured amount of time (default 5 minutes). Since queries that have completed but are not yet flushed to disk show in the sys.impala_query_live table, the queries will be in the sys.impala_query_live table at this point unless the call to self.client.execute slept for more than 5 minutes after the query closed. -- To view, visit http://gerrit.cloudera.org:8080/20762 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743 Gerrit-Change-Number: 20762 Gerrit-PatchSet: 41 Gerrit-Owner: Michael Smith <[email protected]> Gerrit-Reviewer: Andrew Sherman <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 20 Mar 2024 21:20:25 +0000 Gerrit-HasComments: Yes
