Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11224 )
Change subject: IMPALA-7448: Invalidate recently unused tables from catalogd ...................................................................... Patch Set 1: (30 comments) http://gerrit.cloudera.org:8080/#/c/11224/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11224/1//COMMIT_MSG@11 PS1, Line 11: HDFS tables why only HDFS tables? http://gerrit.cloudera.org:8080/#/c/11224/1//COMMIT_MSG@13 PS1, Line 13: query/DML finishes its : execution curious why at end of execution instead of at end of planning? http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@165 PS1, Line 165: UpdateUsedTableNames nit: i think "ReportTableUsage" might be a better name? This current name makes it sound like it is replacing a list instead of just marking that the tables got used. And, maybe we might want to report some more details such as the count of queries in a particular interval that used the table. http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@169 PS1, Line 169: if (!status.ok()) LOG(ERROR) << status.GetDetail(); this should probably be WARNING instead of ERROR since it doesn't affect correctness. http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@171 PS1, Line 171: void InvalidateUnusedTables(TInvalidateUnusedTablesResponse& resp, any particular reason to expose this via RPC instead of just automatic in the background? http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@537 PS1, Line 537: const char msg[] = "unused_table_ttl_sec not specified"; I think if you just pass a string constant, it'll end up in static storage, and you wouldn't need to use the "copying" approach right? http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@539 PS1, Line 539: document->AddMember("error", value, document->GetAllocator()); maybe just early-return here so that the rest of the function doesn't need to be nested? http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@542 PS1, Line 542: unused_table_ttl_sec = boost::lexical_cast<int>(unused_table_ttl_sec_it->second); can you tighten the "catch" block to only cover this line, and then early-return from it? That way the majority of the function won't be indented. http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@547 PS1, Line 547: if (status.ok()) { if not OK don't we want to put this in the 'error'? I think if we can refactor this whole function into another one that just returns Status, then we could consolidate all of the various code that fills in the "error" field in the document into one place http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@549 PS1, Line 549: tables nit: can we name this thrift field "removed_tables" or "invalidated_tables" or something so it's clearer? http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/catalog/catalog-server.cc@570 PS1, Line 570: const char msg[] = "invalid unused_table_ttl_sec value"; : Value value(msg, arraysize(msg)); : nit: indentation off http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/common/global-flags.cc File be/src/common/global-flags.cc: http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/common/global-flags.cc@202 PS1, Line 202: catalog invalidates tables older than this" : "threshold nit: I think this doc should be expanded out to explain a bit more about the usage/semantics. http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/exec/catalog-op-executor.h File be/src/exec/catalog-op-executor.h: http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/exec/catalog-op-executor.h@76 PS1, Line 76: TInvalidateUnusedTablesResponse* resp); nit: indentation http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/exec/catalog-op-executor.cc@336 PS1, Line 336: InvalidateUnusedTable nit: this method name is missing an 's' http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/service/client-request-state.cc File be/src/service/client-request-state.cc: http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/service/client-request-state.cc@528 PS1, Line 528: // Update used tables in this query. I think this should be deferred onto some queue rather than done inline, otherwise this can block the user, right? eg if we have a bounded queue of usages, and some background thread which pulls items off the queue once every few seconds and sends them to catalogd, we avoid this critical-path RPC http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/service/client-request-state.cc@534 PS1, Line 534: req.tables.emplace_back(name); nit: std::move(name) http://gerrit.cloudera.org:8080/#/c/11224/1/be/src/service/client-request-state.cc@537 PS1, Line 537: catalog_op_executor_.reset( : new CatalogOpExecutor(exec_env_, frontend_, server_profile_)); hrm, I'm surprised this is a member variable instead of just in local scope. If it's a member variable, is it already initialized here? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@84 PS1, Line 84: import static jodd.util.ThreadUtil.sleep; I think this is coming by some accidental transitive dependency. Why not just use Thread.sleep or Guava's Uninterruptibles.sleep if you don't want to handle interruption? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@252 PS1, Line 252: if (unusedTableTtlSec != 0) { can we do a check that it's not negative? also, mind refactoring this code into a new method like 'startAutoInvalidationThread' or something? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@253 PS1, Line 253: new Thread(new Runnable() { this thread should have a name and also probably be marked as a daemon thread. http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@255 PS1, Line 255: { what's with this scope? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@262 PS1, Line 262: invalidateUnusedTables(req); I think it makes sense to put a try/catch around this so that, even if the call throws some kind of weird runtime exception, we can log it and keep going. Otherwise we'll silently lose this thread and start to "leak" memory http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1916 PS1, Line 1916: getTable(tableName.db_name, tableName.table_name). do we have to worry about null here? seems plausible that a table might have gotten renamed/removed http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1925 PS1, Line 1925: Calendar oldestTableAllowed = null; I think it would be safer to avoid any calendar calculations and instead just use a monotonic timestamp (System.nanotime gives you a monotonic clock) http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1935 PS1, Line 1935: FeFsTable why only evict FeFsTable? I agree they are the largest, but seems unexpected to ignore the other tables. If we are going to do this, I think it makes sense to actually put the 'lastUsedTime' bit in the HdfsTable class instead of in the Table class, since that will also avoid the extra memory usage in 'IncompleteTable'. http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1935 PS1, Line 1935: req.isSetUnused_table_ttl_sec() isn't this handled up above? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1936 PS1, Line 1936: table.lastUsedTime can we add a getter? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/Table.java@111 PS1, Line 111: protected Calendar lastUsedTime; this seems bulky. Per comment elsewhere how about just using a long, and setting it to System.nanotime? http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java: http://gerrit.cloudera.org:8080/#/c/11224/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java@29 PS1, Line 29: import com.google.common.collect.Iterables; unused http://gerrit.cloudera.org:8080/#/c/11224/1/tests/custom_cluster/test_automatic_invalidation.py File tests/custom_cluster/test_automatic_invalidation.py: http://gerrit.cloudera.org:8080/#/c/11224/1/tests/custom_cluster/test_automatic_invalidation.py@24 PS1, Line 24: class TestAutomaticCatalogInvalidation(CustomClusterTestSuite): I think this test would probably be easier to write in Java so you can directly inspect the state of the Catalog instead of having to do it e2e like this. If you do this in the Java side, you could also easily mock the advancement of time. For example, you could add a Guava 'Ticker' as a static member of CatalogServiceCatalog. In normal runtime the ticker would be set to be the systemTicker, and in the test, you can mock it out to make time advance deterministically. -- To view, visit http://gerrit.cloudera.org:8080/11224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib549717abefcffb14d9a3814ee8cf0de8bd49e89 Gerrit-Change-Number: 11224 Gerrit-PatchSet: 1 Gerrit-Owner: Tianyi Wang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 15 Aug 2018 19:55:51 +0000 Gerrit-HasComments: Yes
