Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8958 )
Change subject: IMPALA-5152: Introduce metadata loading phase ...................................................................... Patch Set 1: (39 comments) http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@73 PS1, Line 73: if (!fromClauseOnly) tblRefs.add(new TableRef(tableName_.toPath(), null)); > Maybe replace the if check with a Preconditions.checkState(!fromClauseOnly) Good point. This Preconditions check would be sensible in many places, so I opted to clean up the collectTabeRefs() interface instead because fromClauseonly=true is only needed for QueryStmts. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@66 PS1, Line 66: // Set in analyzeAndAuthorize(). : private ImpaladCatalog catalog_; > If I understand it correctly, that shouldn't be needed, given that analysis Need this for AnalysisContext#checkSystemDbAccess which is annoying but also difficult to clean up (I tried). http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367 PS1, Line 367: > IMO, this method should be documented, given this is the starting point of Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@518 PS1, Line 518: analysisResult_.isAlterTableStmt() || : analysisResult_.isAlterViewStmt()); > Curious, where did this come from? We can add column priv requests for something like this: use functional; alter table allcomplextypes.int_array_col set fileformat sequencefile; The table path resolves to a column and we register an auth request. Actually, I think any alter table could hit this, so generalized this check to any AlterTableStmt. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309 PS1, Line 309: private final Map<TableName, Table> loadedTables; > nit: My feeling is that we can call this referencedTables_ (and backup with Renamed to stmtTableCache. I renamed this because "referencedTables" could be misleading in the sense that this map could contain tables that are needed for resolving a path, but are not actually the target of that path. So in some sense not all tables are really "referenced" by the statement. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2361 PS1, Line 2361: or the db > I don't think the existence of db is checked in this function anymore. Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372 PS1, Line 2372: if (table.isLoaded() && table instanceof IncompleteTable) { > Preconditions.checkState(table.isLoaded())? Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2383 PS1, Line 2383: wit > nit: with Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java@82 PS1, Line 82: tableName_ > other statements assert that tableName is not null on construction. this st Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@85 PS1, Line 85: tableName_ > can tableName_ be null? Added Preonditions check in c'tor. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java File fe/src/main/java/org/apache/impala/analysis/FromClause.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java@68 PS1, Line 68: public > @Override Does not override since it's not a StatementBase. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@108 PS1, Line 108: sq: > nit: other loops in this file add a space after the iter variable. Done. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java File fe/src/main/java/org/apache/impala/analysis/Path.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java@275 PS1, Line 275: list of table names > pls state any restrictions on the path-- I found this unclear to read. Done. Added a few examples. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@105 PS1, Line 105: } > why aren't baseTblResultExprs_ included? the comment there states that thes The baseTblResultExprs_ are the select-list exprs resolved against inline views. They are used for top-down slot materialization during planning (see where they are used). They are not relevant here because we do not allow subqueries in any select list. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1033 PS1, Line 1033: } > I was expecting this class to look at baseTblResultExprs_ See other comment. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java File fe/src/main/java/org/apache/impala/analysis/StatementBase.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@64 PS1, Line 64: public abstract void collectTableRefs(List<TableRef> tblRefs, boolean fromClauseOnly); > If most of the statement classes that extend StatementBase don't return any Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@823 PS1, Line 823: * ignored and not returned or added to 'loadedTables'. > Comment where defaultDb is used. Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@825 PS1, Line 825: private Set<TableName> getMissingTables > nit: can you put the following two functions below requestTblLoadAndWait so Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835 PS1, Line 835: // Database does not exist. > Maybe catch DatabaseNotFoundEx? CatalogEx seems generic (I checked Catalog# Added a Preconditions check. Due to ImpaladTestCatalog#getTable it's not straightforward to switch to DBNotFoundEx everywhere, so we still need to catch CatalogException here. We should probably clean up how tables are loaded in a test env later. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@838 PS1, Line 838: if (!tbl.isLoaded()) { > Should we handle tbl.isLoaded() && tbl instanceof IncompleteTable here and I don't think that's correct because we still need to prefer auth errors over analysis errors. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@859 PS1, Line 859: private Set<TableName> collectTableCandidates(StatementBase stmt, String defaultDb) { > I feel like this method belongs to the AnalysisCtx() and not frontend? I'm open to moving it but don't see how it would work: * collectTableCandidates() is called by requestTableLoadAndWait() and getMissingTables() * getMissingTables() is called by requestTableLoadAndWait() * requestTableLoadAndWait() is called in some places where no AnalysisCtx is available (e.g., MetadataOp.getColumns()) Dimitris had a different suggestion for cleaning up the flow. I'm following that now. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@866 PS1, Line 866: org.apache.impala.analysis.Path.getCandidateTables > nit: you can use import static if you want to avoid the classpath inside th Cashes with org.apache.hadoop.fs.Path and might lead to confusion. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@872 PS1, Line 872: LoadedTables > The name sounds a bit generic. Maybe LoadedTablesCache or something to indi Renamed to StmtTableCache though the "Stmt" part does not make sense for some uses. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@873 PS1, Line 873: public final ImpaladCatalog catalog; > I see that this is used for initializing the Analyzer. But given that the a Needs it to resolve built-in functions and UDFs and seems better to give it the version of the catalog at the time of loading success although that is not strictly necessary. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@885 PS1, Line 885: If non-NULL > You mean the timeline, no? Reworded. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@889 PS1, Line 889: defaultDb > nit: I would call it sessionDb Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@903 PS1, Line 903: public > public? Called by MetadataOp.getColumns() http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@910 PS1, Line 910: if (missingTbls.isEmpty()) return new LoadedTables(catalog, loadedTbls); > Should we update something in the timeline to reflect that all the metadata Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@916 PS1, Line 916: requestedTbls > It seems that this is only used for the log message in L965. Can't you use We could get rid of the message if you prefer. It's not easy to maintain a counter because we may loop several times in the while() loop below waiting for the same set of tables. We also cannot easily distinguish which tables in loadedTables were already loaded and which ones were loaded in this process. The easiest way is to maintain this set, but we can remove it if you prefer. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@919 PS1, Line 919: final long debugLoggingFrequency = 10; : final long retryLoadFrequency = 20; > nit: move them at the beginning of this function? Moved above, but I feel they are more relevant here because in most cases statements should not enter this loading loop, but be served from the cache and never get here. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@924 PS1, Line 924: if (issueLoadRequest) { > Since we know there are missing tbls (condition in while), why do we need t Bharath had a similar question. Responded in Bharath's comment thread below and expanded comments in the code. The reason for this flag is that we may be sitting in this loop for several iterations waiting for the tables to arrive. We don't want to keep hammering the catalogd asking for the same tables over and over again. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@931 PS1, Line 931: currCatalog != catalog > Not fully familiar with != implementation, so I could be wrong on this one. This is checking for equality of the object references which I think is correct. We only create a new ImpaladCatalog when processing a non-delta update. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@942 PS1, Line 942: LOG.warn(String.format("Waiting for table metadata. " + > Should we dump current missingTables() too? It's printing missingTbls. Is that what you mean or something else? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@943 PS1, Line 943: Tables remaining: %s > If views are used, that number may go up and down and I am wondering if thi That's the nature of how our loading process works with views. Do you have an alternative in mind? http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@951 PS1, Line 951: getMissingTables(catalog, missingTbls, defaultDb, loadedTbls); > Make getMissingTables() always use getCatalog() instead of passing it aroun I think it's cleaner that one such invocation uses the same Catalog reference. Is't probably ok to mix catalog references but seems easier to reason about this way. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@958 PS1, Line 958: issueLoadRequest > Not sure I follow the need for this. L960 makes sure missingTbls is up to d There are two steps to loading a set of tables. First, you must issue a loading request RPC to the catalogd, and second you must wait for the loaded metadata to arrive. After the first step, we may be sitting in this loop for a while, waiting for the loaded tables to trickle in from the statestore. So missingTbls is still not empty, but we've already requested them to be loaded. In that phase, we should not repeatadly issue redundant RPCs to the catalogd. Expanded comment to explain this. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1091 PS1, Line 1091: AnalysisContext analysisCtx = new AnalysisContext(queryCtx, authzConfig_, timeline); : AnalysisResult analysisResult = : analysisCtx.analyzeAndAuthorize(queryCtx.client_request.stmt, this); > As is today, we have the following call sequence: Frontend::createExecReque Good idea, I like that flow better. Done. http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java File fe/src/main/java/org/apache/impala/service/MetadataOp.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java@281 PS1, Line 281: ; > nit: extra ";" Done http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java: http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java@104 PS1, Line 104: Table does > I think the previous error was more informative (and it implies that the th Fair point. We are somewhat inconsistent about this today. Here's some background to help us decide what to do: * In the general case of table paths, it is not meaningful to return "database not found". Consider a path like "FROM a.b.c". If neither "defaultDb.a.b.c" nor "a.b.c" can be resolved it does not make much sense to return "database a does not exist". We cannot presume to know what the user meant. * In specific cases where a TableName is given or a path with <= 2 elements, we can provide a better error message. * So providing a clearer error message is only sometimes possible and to give that error message we'd need to extend our query-local metadata cache to also maintain the Db seen during the metadata collection/loading phase. Do you think it's worth extending the cache for sometimes getting better error messages? -- To view, visit http://gerrit.cloudera.org:8080/8958 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55 Gerrit-Change-Number: 8958 Gerrit-PatchSet: 1 Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Mon, 22 Jan 2018 22:20:31 +0000 Gerrit-HasComments: Yes
