Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8958 )

Change subject: IMPALA-5152: Introduce metadata loading phase
......................................................................


Patch Set 1:

(39 comments)

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
File fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java@73
PS1, Line 73:     if (!fromClauseOnly) tblRefs.add(new 
TableRef(tableName_.toPath(), null));
> Maybe replace the if check with a Preconditions.checkState(!fromClauseOnly)
Good point. This Preconditions check would be sensible in many places, so I 
opted to clean up
the collectTabeRefs() interface instead because fromClauseonly=true is only 
needed for QueryStmts.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@66
PS1, Line 66: // Set in analyzeAndAuthorize().
            :   private ImpaladCatalog catalog_;
> If I understand it correctly, that shouldn't be needed, given that analysis
Need this for AnalysisContext#checkSystemDbAccess which is annoying but also 
difficult to clean up (I tried).


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@367
PS1, Line 367:
> IMO, this method should be documented, given this is the starting point of
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@518
PS1, Line 518: analysisResult_.isAlterTableStmt() ||
             :             analysisResult_.isAlterViewStmt());
> Curious, where did this come from?
We can add column priv requests for something like this:

use functional;
alter table allcomplextypes.int_array_col set fileformat sequencefile;

The table path resolves to a column and we register an auth request.

Actually, I think any alter table could hit this, so generalized this check to 
any AlterTableStmt.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@309
PS1, Line 309:     private final Map<TableName, Table> loadedTables;
> nit: My feeling is that we can call this referencedTables_ (and backup with
Renamed to stmtTableCache.

I renamed this because "referencedTables" could be misleading in the sense that 
this map could contain tables that are needed for resolving a path, but are not 
actually the target of that path. So in some sense not all tables are really 
"referenced" by the statement.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2361
PS1, Line 2361: or the db
> I don't think the existence of db is checked in this function anymore.
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2372
PS1, Line 2372:     if (table.isLoaded() && table instanceof IncompleteTable) {
> Preconditions.checkState(table.isLoaded())?
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2383
PS1, Line 2383: wit
> nit: with
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropStatsStmt.java@82
PS1, Line 82: tableName_
> other statements assert that tableName is not null on construction. this st
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@85
PS1, Line 85: tableName_
> can tableName_ be null?
Added Preonditions check in c'tor.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java
File fe/src/main/java/org/apache/impala/analysis/FromClause.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/FromClause.java@68
PS1, Line 68: public
> @Override
Does not override since it's not a StatementBase.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
File fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java@108
PS1, Line 108: sq:
> nit: other loops in this file add a space after the iter variable.
Done.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java
File fe/src/main/java/org/apache/impala/analysis/Path.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/Path.java@275
PS1, Line 275: list of table names
> pls state any restrictions on the path-- I found this unclear to read.
Done. Added a few examples.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@105
PS1, Line 105:     }
> why aren't baseTblResultExprs_ included? the comment there states that thes
The baseTblResultExprs_ are the select-list exprs resolved against inline 
views. They are used for top-down slot materialization during planning (see 
where they are used).

They are not relevant here because we do not allow subqueries in any select 
list.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@1033
PS1, Line 1033:     }
> I was expecting this class to look at baseTblResultExprs_
See other comment.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java
File fe/src/main/java/org/apache/impala/analysis/StatementBase.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/analysis/StatementBase.java@64
PS1, Line 64: public abstract void collectTableRefs(List<TableRef> tblRefs, 
boolean fromClauseOnly);
> If most of the statement classes that extend StatementBase don't return any
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@823
PS1, Line 823:    * ignored and not returned or added to 'loadedTables'.
> Comment where defaultDb is used.
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@825
PS1, Line 825: private Set<TableName> getMissingTables
> nit: can you put the following two functions below requestTblLoadAndWait so
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@835
PS1, Line 835:         // Database does not exist.
> Maybe catch DatabaseNotFoundEx? CatalogEx seems generic (I checked Catalog#
Added a Preconditions check. Due to ImpaladTestCatalog#getTable it's not 
straightforward to switch to DBNotFoundEx everywhere, so we still need to catch 
CatalogException here. We should probably clean up how tables are loaded in a 
test env later.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@838
PS1, Line 838:       if (!tbl.isLoaded()) {
> Should we handle tbl.isLoaded() && tbl instanceof IncompleteTable here and
I don't think that's correct because we still need to prefer auth errors over 
analysis errors.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@859
PS1, Line 859:   private Set<TableName> collectTableCandidates(StatementBase 
stmt, String defaultDb) {
> I feel like this method belongs to the AnalysisCtx() and not frontend?
I'm open to moving it but don't see how it would work:
* collectTableCandidates() is called by requestTableLoadAndWait() and 
getMissingTables()
* getMissingTables() is called by requestTableLoadAndWait()
* requestTableLoadAndWait() is called in some places where no AnalysisCtx is 
available (e.g., MetadataOp.getColumns())

Dimitris had a different suggestion for cleaning up the flow. I'm following 
that now.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@866
PS1, Line 866: org.apache.impala.analysis.Path.getCandidateTables
> nit: you can use import static if you want to avoid the classpath inside th
Cashes with org.apache.hadoop.fs.Path and might lead to confusion.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@872
PS1, Line 872: LoadedTables
> The name sounds a bit generic. Maybe LoadedTablesCache or something to indi
Renamed to StmtTableCache though the "Stmt" part does not make sense for some 
uses.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@873
PS1, Line 873: public final ImpaladCatalog catalog;
> I see that this is used for initializing the Analyzer. But given that the a
Needs it to resolve built-in functions and UDFs and seems better to give it the 
version of the catalog at the time of loading success although that is not 
strictly necessary.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@885
PS1, Line 885: If non-NULL
> You mean the timeline, no?
Reworded.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@889
PS1, Line 889: defaultDb
> nit: I would call it sessionDb
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@903
PS1, Line 903: public
> public?
Called by MetadataOp.getColumns()


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@910
PS1, Line 910:     if (missingTbls.isEmpty()) return new LoadedTables(catalog, 
loadedTbls);
> Should we update something in the timeline to reflect that all the metadata
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@916
PS1, Line 916: requestedTbls
> It seems that this is only used for the log message in L965. Can't you use
We could get rid of the message if you prefer. It's not easy to maintain a 
counter because we may loop several times in the while() loop below waiting for 
the same set of tables.
We also cannot easily distinguish which tables in loadedTables were already 
loaded and which ones were loaded in this process. The easiest way is to 
maintain this set, but we can remove it if you prefer.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@919
PS1, Line 919: final long debugLoggingFrequency = 10;
             :     final long retryLoadFrequency = 20;
> nit: move them at the beginning of this function?
Moved above, but I feel they are more relevant here because in most cases 
statements should not enter this loading loop, but be served from the cache and 
never get here.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@924
PS1, Line 924: if (issueLoadRequest) {
> Since we know there are missing tbls (condition in while), why do we need t
Bharath had a similar question. Responded in Bharath's comment thread below and 
expanded comments in the code.
The reason for this flag is that we may be sitting in this loop for several 
iterations waiting for the tables to arrive. We don't want to keep hammering 
the catalogd asking for the same tables over and over again.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@931
PS1, Line 931: currCatalog != catalog
> Not fully familiar with != implementation, so I could be wrong on this one.
This is checking for equality of the object references which I think is 
correct. We only create a new ImpaladCatalog when processing a non-delta update.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@942
PS1, Line 942:           LOG.warn(String.format("Waiting for table metadata. " +
> Should we dump current missingTables() too?
It's printing missingTbls. Is that what you mean or something else?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@943
PS1, Line 943: Tables remaining: %s
> If views are used, that number may go up and down and I am wondering if thi
That's the nature of how our loading process works with views. Do you have an 
alternative in mind?


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@951
PS1, Line 951:           getMissingTables(catalog, missingTbls, defaultDb, 
loadedTbls);
> Make getMissingTables() always use getCatalog() instead of passing it aroun
I think it's cleaner that one such invocation uses the same Catalog reference. 
Is't probably ok to mix catalog references but seems easier to reason about 
this way.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@958
PS1, Line 958: issueLoadRequest
> Not sure I follow the need for this. L960 makes sure missingTbls is up to d
There are two steps to loading a set of tables.
First, you must issue a loading request RPC to the catalogd, and second you 
must wait for the loaded metadata to arrive.
After the first step, we may be sitting in this loop for a while, waiting for 
the loaded tables to trickle in from
the statestore. So missingTbls is still not empty, but we've already requested 
them to be loaded.
In that phase, we should not repeatadly issue redundant RPCs to the catalogd.

Expanded comment to explain this.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1091
PS1, Line 1091: AnalysisContext analysisCtx = new AnalysisContext(queryCtx, 
authzConfig_, timeline);
              :     AnalysisResult analysisResult =
              :         
analysisCtx.analyzeAndAuthorize(queryCtx.client_request.stmt, this);
> As is today, we have the following call sequence: Frontend::createExecReque
Good idea, I like that flow better. Done.


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java
File fe/src/main/java/org/apache/impala/service/MetadataOp.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/main/java/org/apache/impala/service/MetadataOp.java@281
PS1, Line 281: ;
> nit: extra ";"
Done


http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java:

http://gerrit.cloudera.org:8080/#/c/8958/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java@104
PS1, Line 104: Table does
> I think the previous error was more informative (and it implies that the th
Fair point. We are somewhat inconsistent about this today. Here's some 
background to help us decide what to do:
* In the general case of table paths, it is not meaningful to return "database 
not found". Consider a path like "FROM a.b.c". If neither "defaultDb.a.b.c" nor 
"a.b.c" can be resolved it does not make much sense to return "database a does 
not exist". We cannot presume to know what the user meant.
* In specific cases where a TableName is given or a path with <= 2 elements, we 
can provide a better error message.
* So providing a clearer error message is only sometimes possible and to give 
that error message we'd need to extend our query-local metadata cache to also 
maintain the Db seen during the metadata collection/loading phase.

Do you think it's worth extending the cache for sometimes getting better error 
messages?



--
To view, visit http://gerrit.cloudera.org:8080/8958
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I68d32d5acd4a6f6bc6cedb05e6cc5cf604d24a55
Gerrit-Change-Number: 8958
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Mon, 22 Jan 2018 22:20:31 +0000
Gerrit-HasComments: Yes

Reply via email to