Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/11182 )
Change subject: IMPALA_7127 (continued): initial fetch-from-catalogd implementation ...................................................................... Patch Set 1: (13 comments) I've spent some time on the patch. I think I understand the interfaces/thrift changes now but I still need to dig deeper. Publishing some high-level comments (and some nits). Will spend more time on this. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/catalog/catalog.h@79 PS1, Line 79: nit: formatting. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/catalog/catalog.cc@95 PS1, Line 95: : nit: can be formatted in 2 lines. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/exec/catalog-op-executor.h File be/src/exec/catalog-op-executor.h: http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/exec/catalog-op-executor.h@55 PS1, Line 55: nit: formatting. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/exec/catalog-op-executor.cc File be/src/exec/catalog-op-executor.cc: http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/exec/catalog-op-executor.cc@289 PS1, Line 289: const TNetworkAddress& address = DCHECK(use_local_catalog) ? Also, generally I think we should have asserts that the new 'partial' API is not called from the clients using the current implementation of the CatalogService. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/exec/catalog-op-executor.cc@295 PS1, Line 295: client.DoRpc(&CatalogServiceClientWrapper::GetPartialCatalogObject, req, resp)); For future: We should probably add some instrumentation around this RPC for supportability. Count by table/object/total, RPC times etc. http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/service/fe-support.cc File be/src/service/fe-support.cc: http://gerrit.cloudera.org:8080/#/c/11182/1/be/src/service/fe-support.cc@539 PS1, Line 539: if (!status.ok()) { Is this redundant? Looks like THROW_IF_ERROR_RET checks the same. or may be just return nullptr? http://gerrit.cloudera.org:8080/#/c/11182/1/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/11182/1/common/thrift/CatalogService.thrift@239 PS1, Line 239: 1: bool want_db_names Would this ever be false? Tracking the callers, it does not appear so to me. http://gerrit.cloudera.org:8080/#/c/11182/1/common/thrift/CatalogService.thrift@246 PS1, Line 246: Selector Just curious if we should use 'request'/'response' terminology to make it more clear. Something like TPartitionTableInfoRequest and TPartialTableInfoResp or something like that. http://gerrit.cloudera.org:8080/#/c/11182/1/common/thrift/CatalogService.thrift@313 PS1, Line 313: optional Status.TStatus status : : 2: optional i64 object_version_number Should these be required always? http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1828 PS1, Line 1828: public TGetPartialCatalogObjectResponse getPartialCatalogObject( Commented elsewhere too, wondering if these should go into a V2 implementation of the CatalogServiceCatalog that supports partial requests/responses? (which only gets instantiated with use_local_catalog=true..) http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1879 PS1, Line 1879: dbCache_ Don't we require a versionLock_.readLock() for this? http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1705 PS1, Line 1705: partId It looks like we are relying on the premise that the partId of partition does not change over its lifetime. I think there are cases when that is not true. Forex: reloadPartition() [1] does a drop() and add(). createPartition() creates the newId for that partition. [1] https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2066 http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11182/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@101 PS1, Line 101: "Catalog object %s changed version from %s to %s while fetching metadata", nit: This should probably also include something that makes sense to the user. Something like the partition (a=1) changed while the query is in progress... -- To view, visit http://gerrit.cloudera.org:8080/11182 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If49207fc592b1cc552fbcc7199568b6833f86901 Gerrit-Change-Number: 11182 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tianyi Wang <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Mon, 13 Aug 2018 02:19:39 +0000 Gerrit-HasComments: Yes
