Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10869 )

Change subject: IMPALA-5035 Impala should not load INDEX_TABLEs from HMS
......................................................................


Patch Set 5:

(4 comments)

In addition to the one implementation concern, I'm not sure that I agree with 
the behavior change here. It seems inconsistent in a few ways:

1) for other unsupported table types, we still list them, and fail to load. Why 
should index tables be treated differently than unsupported serdes? Won't this 
produce confusing behavior if someone attempts to create a table with a 
conflicting name?

2) Seems inconsistent with Hive as well. 'SHOW TABLES IN functional;' in hive 
does show hive_index_tbl. Why introduce the inconsistency between engines?

Perhaps we'd be better off just throwing a more useful exception like "Hive 
index tables are not supported" if we think that's better than the 'Unsupported 
table type: INDEX_TABLE' message?

http://gerrit.cloudera.org:8080/#/c/10869/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10869/5//COMMIT_MSG@28
PS5, Line 28: egion; //external table
it seems you have some typos here. Did you not just copy-paste this from the 
shell?


http://gerrit.cloudera.org:8080/#/c/10869/5//COMMIT_MSG@29
PS5, Line 29: > show create talbe hive_index_tbl;
here too


http://gerrit.cloudera.org:8080/#/c/10869/5//COMMIT_MSG@31
PS5, Line 31: ERROR: AnalysisException: Table does not exist: hive_index_tbl
what is the behavior if you try to create a table with the same name that 
conflicts with a Hive index table?


http://gerrit.cloudera.org:8080/#/c/10869/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/10869/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1055
PS5, Line 1055:       List<org.apache.hadoop.hive.metastore.api.Table> 
tableList =
              :               
msClient.getHiveClient().getTableObjectsByName(dbName, tableNameList);
I'm worried about the performance of this in metastores with large numbers of 
tables. Loading the full table object at this stage is a lot more expensive 
than just the names. In a recent case I was working on with a production 
cluster there were many millions of tables and putting this here would be very 
harmful to performance.



--
To view, visit http://gerrit.cloudera.org:8080/10869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6e2ccbeee3e612eab96d89a211d15fc119b08d70
Gerrit-Change-Number: 10869
Gerrit-PatchSet: 5
Gerrit-Owner: Peikai Zheng <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Mon, 09 Jul 2018 17:53:32 +0000
Gerrit-HasComments: Yes

Reply via email to