Michael Smith has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20762 )

Change subject: IMPALA-12540: Query Live Table
......................................................................


Patch Set 41:

(32 comments)

Upload incoming. I still need to add some of the requested tests.

http://gerrit.cloudera.org:8080/#/c/20762/41//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20762/41//COMMIT_MSG@19
PS41, Line 19:
             : Query: explain select * from sys.impala_query_live
> Consider adding planner test for queries against sys.impala_query_live.
Done.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h
File be/src/exec/system-table-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scan-node.h@40
PS41, Line 40: by fetching more data from metrics.
> Nit: this comment does not make sense in this context.  Possibly a copy-and
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h
File be/src/exec/system-table-scanner.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.h@42
PS41, Line 42:   bool eos() const { return eos_; }
> nit: add noexcept?
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.cc
File be/src/exec/system-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.cc@26
PS41, Line 26: #include "runtime/decimal-value.h"
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.cc@27
PS41, Line 27: #include "runtime/decimal-value.inline.h"
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/exec/system-table-scanner.cc@71
PS41, Line 71: MaterializeNextRow
> WriteStringSlot?
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/fe-support.cc
File be/src/service/fe-support.cc:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/fe-support.cc@655
PS41, Line 655: for (const auto& it : membership_snapshot->current_backends) {
              :     if (it.second.has_is_coordinator() && 
it.second.is_coordinator()) {
              :       VLOG_QUERY << "Found coordinator "
              :                  << it.second.address().hostname() << ":" << 
it.second.address().port();
              :       
coordinators.emplace_back(FromNetworkAddressPB(it.second.address()));
              :     }
              :   }
> What do you think about pushing this to ClusterMembershipMgr or ClusterMemb
Makes sense, I've moved it there.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@509
PS41, Line 509:   size_t NumLiveQueries();
> Please mark this function const noexcept.
It's not though. It has to take a lock (which is not marked mutable). Locking 
can throw an exception.


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@508
PS41, Line 508: /// Return the number of live queries managed by this server.
              :   size_t NumLiveQueries();
> mention that this is implemented in workload-management.cc
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1138
PS41, Line 1138:   /// Returns a list of completed queries that have not yet 
been written to storage.
> Please document that this function makes a copy of the completed queries qu
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1139
PS41, Line 1139:   std::vector<std::shared_ptr<QueryStateExpanded>> 
GetCompletedQueries();
> Please mark this function const noexcept.  The completed_queries_lock_ will
Do we really want to do that? lock() can throw an exception. We do occasionally 
mark locks as mutable, but not consistently. What's the rationale for doing so 
here?


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/service/impala-server.h@1138
PS41, Line 1138: /// Returns a list of completed queries that have not yet been 
written to storage.
               :   std::vector<std::shared_ptr<QueryStateExpanded>> 
GetCompletedQueries();
> mention that this is implemented in workload-management.cc
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h
File be/src/util/sharded-query-map-util.h:

http://gerrit.cloudera.org:8080/#/c/20762/41/be/src/util/sharded-query-map-util.h@59
PS41, Line 59:   size_t Count() {
> Can this function be marked const noexcept?
It acquires a lock, which is not const or noexcept.


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/CatalogObjects.thrift@65
PS41, Line 65:   SYSTEM_TABLE = 8
> Add a brief comment
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/CatalogObjects.thrift@672
PS41, Line 672: QUERY_LIVE
> Add a brief comment
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/StatestoreService.thrift
File common/thrift/StatestoreService.thrift:

http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/StatestoreService.thrift@207
PS41, Line 207: // A list of network addresses
              : struct TAddressesList {
              :   1: required list<Types.TNetworkAddress> addresses;
              : }
> Move this to Types.thrift?
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift
File common/thrift/SystemTables.thrift:

http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift@21
PS41, Line 21: enum TQueryColumn {
> nit: I think TSystemTableColumn is a better fit.
SystemTable is a generic concept that may be used for other things (like a 
MetricScanner). Maybe TQueryTableColumn?


http://gerrit.cloudera.org:8080/#/c/20762/41/common/thrift/SystemTables.thrift@21
PS41, Line 21: enum TQueryColumn {
> Add a note that this must be kept in sync with workload-management-fields.c
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/Db.java@134
PS41, Line 134: "sys"
> Nit: should we have a constant for this?
Yeah, added one.


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java
File fe/src/main/java/org/apache/impala/catalog/SystemTable.java:

http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@52
PS41, Line 52:   private static Map<String, TSystemTableName> 
SYSTEM_TABLE_NAME_MAP =
> could be final?
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@52
PS41, Line 52:   private static Map<String, TSystemTableName> 
SYSTEM_TABLE_NAME_MAP =
             :       ImmutableMap.of(QUERY_LIVE, TSystemTableName.QUERY_LIVE);
> And assert map size is 1
This is an ImmutableMap. I'm confused by your comment Riza.

It seems correct by definition. And the test I've added to PlannerTest won't 
work if they're wrong.


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@184
PS41, Line 184: String, String
> unnecessary?
They both appear to be required. I was mimicking 
CatalogOpExecutor::createMetaStoreTable.


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/catalog/SystemTable.java@186
PS41, Line 186: FieldSchema
> unnecessary?
Required.


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java
File fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java:

http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@21
PS41, Line 21: import java.util.Map;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@23
PS41, Line 23: import org.slf4j.Logger;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@24
PS41, Line 24: import org.slf4j.LoggerFactory;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@43
PS41, Line 43: import org.apache.impala.thrift.TSystemTableName;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/fe/src/main/java/org/apache/impala/planner/SystemTableScanNode.java@46
PS41, Line 46: import com.google.common.collect.ImmutableMap;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py
File tests/custom_cluster/test_query_live.py:

http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@30
PS41, Line 30: vector
> unused
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@33
PS41, Line 33: fite
> nit: fit?
Done


http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@36
PS41, Line 36:     assert_query('sys.impala_query_live', self.client, 
'test_query_live_1',
> Is there any race where the data about the query can get written before we
It includes currently running queries - which will be true about itself - as 
well as queries that have completed but not been written to impala_query_log in 
a batch.

Batch interval is controlled by query_log_write_interval_s, which defaults to 
300s. It starts when the cluster is started, so we have 5 minutes to start the 
cluster and run this test before results start getting written out.


http://gerrit.cloudera.org:8080/#/c/20762/41/tests/custom_cluster/test_query_live.py@64
PS41, Line 64: vector
> unused
Done



--
To view, visit http://gerrit.cloudera.org:8080/20762
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2f9a449f0e5502078931e7f1c5df6e0b762c743
Gerrit-Change-Number: 20762
Gerrit-PatchSet: 41
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Andrew Sherman <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jason Fehr <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Wed, 20 Mar 2024 23:53:39 +0000
Gerrit-HasComments: Yes

Reply via email to