Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/8529 )
Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. ...................................................................... Patch Set 2: (28 comments) Great change, exposes some invaluable information when debugging Catalog issues, which otherwise is a black box. I still need to look deeper into some parts of the code around HdfsTable.toThrift() stats accounting and TopNCache unit tests. Flushing out what I have so far. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@201 PS2, Line 201: used Clarify what is 'used'? Maybe we should also mention that its top-n and not everything (with a reference to the fe call) http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@217 PS2, Line 217: Update nit: May be 'GetTopNTableUsageMetrics()' or something (I know it sounds horrible :D)? My thinking is that we are not fully accounting for the entire usage and the method name could be misleading. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@224 PS2, Line 224: object_name nit: probably just 'name'? Its implicit (from the URL) that its a table. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@359 PS2, Line 359: UpdateCatalogUsage(document); Whats your take on moving top-x stuff to the /metrics page and populating them like other metrics? I know some downstream clients (like CM) read that page and populate nice charts on how those metrics change over time. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@390 PS2, Line 390: has_metrics.SetBool(true); Just curious, whats the use of has_metrics=true/has_large_tables=true? We could just read the json and see if they exist? http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@492 PS2, Line 492: Webserver::ArgumentMap::const_iterator object_name_arg = args.find("object_name"); should we enable json view for this? (?json) http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@90 PS2, Line 90: in nit: into? http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@96 PS2, Line 96: Status GetCatalogUsage(TGetCatalogUsageResponse* response); Same comment as in the other header, may be rename to convey that it is GetTopN....()? http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift@90 PS2, Line 90: // Arguments to getTableMetrics, which returns the metrics of a specific table. : struct TGetTableMetricsParams { : 1: required string db : 2: required string tbl : } : : // Response to a getTableMetrics request. The response contains all the collected metrics : // pretty-printed into a string. : struct TGetTableMetricsResponse { : 1: required string metrics : } Can't we use TTableName and String directly? (or) probably use typedefs typedef TTableName TGetTableMetricsParams typedef string TGetTableMetricsResponse http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@604 PS2, Line 604: Currently, it keeps track of the : // estimated memory usage and the number of metadata operations that were performed on : // that table since it was loaded. Remove here and describe below may be, can likely become stale soon (Same for the next struct too) http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@607 PS2, Line 607: Stats Looks like we are using Stats/Metrics interchangeably. Maybe use Metrics everywhere since "Stats" can coincide with table stats. http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@608 PS2, Line 608: required string table_name TTableName? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154 PS2, Line 154: Table tbl = db.getTable(tableName); : if (tbl != null) { : tbl.incrementMetadataOpsCount(); : CatalogUsageMonitor.INSTANCE.updateFrequentlyAccessedTables(tbl); : } Looks subtle and flaky to me. This seems to work on the premise that we call Catalog.getTable() only for a metadata-op? Future patches could likely break the counts if they don't know this? May be we could move this to CatalogOpEx.getExistingTable() (Its private and so is only called from inside for DDLs) http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1007 PS2, Line 1007: final Timer.Context context Should the scope include tryLockTable()? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372 PS2, Line 1372: public TGetCatalogUsageResponse getCatalogUsage() { getTopN...Metrics? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1391 PS2, Line 1391: public String getTableMetrics(String dbName, String tblName) throws CatalogException { Doc? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392 PS2, Line 1392: Table tbl = getTable(dbName, tblName); This is the problem I was talking about. This call increments the metadatOpsCount() as per this patch. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1398 PS2, Line 1398: No metrics available for table " + dbName + "." + tblName Also may be add "Table not yet loaded..."? Its unclear why metrics are not available yet. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java File fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java@47 PS2, Line 47: true It is unclear why evict should be true (or) whats the usecase for that. Let's say we have 25 tables each accessed 100 times and a 26th table that is accessed for the first time, this makes sure that the last one of 25 is evicted, now we have 24 tables with 100 metadataops count and 1 table with 1 metadata op count. As a user I think I'd be more interested in the former? No? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS2, Line 221: StorageStats Maybe call FileMetadataStats to be ni sync with rest of the class? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1201 PS2, Line 1201: public void load(boolean reuseMetadata, IMetaStoreClient client, Make rest of the load()'s private so that all loads happen via this call and accounted? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@234 PS2, Line 234: kuduTableName_, e); formatting (below too) http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java@135 PS2, Line 135: public void setEstimatedMetadataSize(long estimatedMetadataSize) { : estimatedMetadataSize_.set(estimatedMetadataSize); : } : : public void incrementMetadataOpsCount() { metadataOpsCount_.incrementAndGet(); } Can these make a call to CatalogUsageMonitor.increment*(this) too? We can avoid making a separate call at multiple places. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java File fe/src/main/java/org/apache/impala/common/Metrics.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@143 PS2, Line 143: %le ? http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@147 PS2, Line 147: } Same as my comment in the backend, can we expose json versions of these? toJson() http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@259 PS2, Line 259: Can we add gauge metrics around this switch-case block to count DDLs by type? We could find data points like, there have been too many REFRESHes in the past 1 hour etc. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java File fe/src/main/java/org/apache/impala/util/TopNCache.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java@41 PS2, Line 41: R extends Long> Do we need this? It is almost always Long, no? http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl File www/catalog.tmpl: http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl@23 PS2, Line 23: {{?has_large_tables}} Like I mentioned elsewhere, my feeling is that these belong to metrics and should be exposed in /metrics?json for clients (like CM) to poll and chart. -- To view, visit http://gerrit.cloudera.org:8080/8529 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Gerrit-Change-Number: 8529 Gerrit-PatchSet: 2 Gerrit-Owner: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Wed, 06 Dec 2017 21:24:14 +0000 Gerrit-HasComments: Yes
