Jason Fehr has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20762 )

Change subject: IMPALA-12540: Query Live Table
......................................................................


Patch Set 41:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h
File be/src/exec/system-table-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h@40
PS41, Line 40: by fetching more data from metrics.
Nit: this comment does not make sense in this context.  Possibly a 
copy-and-paste from somewhere else?


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h@43
PS41, Line 43: NYI
> Yeah, this pattern appears to be fairly common across PlanNode implementati
nit: please add a comment stating this function always returns an error


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h
File be/src/exec/system-table-scanner.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h@42
PS41, Line 42:   bool eos() const { return eos_; }
nit: add noexcept?


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@509
PS41, Line 509:   size_t NumLiveQueries();
Please mark this function const noexcept.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1138
PS41, Line 1138:   /// Returns a list of completed queries that have not yet 
been written to storage.
Please document that this function makes a copy of the completed queries queue 
and returns it so that callers know thery do not need to hold any locks before 
calling this function.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1139
PS41, Line 1139:   std::vector<std::shared_ptr<QueryStateExpanded>> 
GetCompletedQueries();
Please mark this function const noexcept.  The completed_queries_lock_ will 
also need to be marked mutable.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h@59
PS41, Line 59:   size_t Count() {
Can this function be marked const noexcept?


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift
File common/thrift/SystemTables.thrift:

http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift@21
PS41, Line 21: enum TQueryColumn {
> nit: I think TSystemTableColumn is a better fit.
Since this enum is for the active queries table which is a subclass of system 
table, the name TQueryColumn is more descriptive since there could be multiple 
subclasses of system table in the future.


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@52
PS41, Line 52:   private static Map<String, TSystemTableName> 
SYSTEM_TABLE_NAME_MAP =
             :       ImmutableMap.of(QUERY_LIVE, TSystemTableName.QUERY_LIVE);
> Consider using ImmutableMap.
And assert map size is 1


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java
File fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@121
PS41, Line 121: 1024L * 1024L
> Looks like 1MB is a standard MIN_MEMORY_ESTIMATE.
Could also use table_.getNumRows() multiplied by some constant to get a better 
guess.  I agree though that it's fine as-is and can be left unchanged.


http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py
File tests/custom_cluster/test_query_live.py:

http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@36
PS41, Line 36:     assert_query('sys.impala_query_live', self.client, 
'test_query_live_1',
> Is there any race where the data about the query can get written before we
There is a thread that flushes the completed queries in-memory queue to the 
sys.impala_query_log table, and that thread starts off sleeping for the 
configured amount of time (default 5 minutes).  Since queries that have 
completed but are not yet flushed to disk show in the sys.impala_query_live 
table, the queries will be in the sys.impala_query_live table at this point 
unless the call to self.client.execute slept for more than 5 minutes after the 
query closed.



--
To view, visit http://gerrit.cloudera.org:8080/20762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743
Gerrit-Change-Number: 20762
Gerrit-PatchSet: 41
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 20 Mar 2024 21:20:25 +0000
Gerrit-HasComments: Yes

Reply via email to