Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8752 )

Change subject: IMPALA-5058: Improve the concurrency of DDL/DML operations
......................................................................


Patch Set 2: Code-Review+1

(10 comments)

I've spent some time on this patch. The logic and code structure LGTM. I agree 
with Alex that we should get this committed and then keep making improvements.

Given the number of edge cases around SYNC_DDL / invalidate/ statestore 
broadcast failures we should probably run a stress test too with some injected 
failures in those paths.

http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc
File be/src/catalog/catalog-server.cc:

http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/catalog/catalog-server.cc@331
PS2, Line 331:     VLOG(1) << "Publishing " << (topic_deletions ? "deletion " : 
"update ")
This could probably be verbose if we print for every catalog_object?  I agree 
it helps to debug in builds, but just wanted to confirm if that was 
intentional. (Since we have dynamic log levels, we can move it to VLOG(3) as 
well if you think that is ok).


http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/8752/2/be/src/service/impala-server.cc@1474
PS2, Line 1474:   VLOG_QUERY << "Done waiting for catalog version: " << 
catalog_update_version;
nit: We should probably not log this if we detect a catalog_service_id change 
and indicate in the logs that it had been restarted or something? (same in 
other cases too)


http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/8752/2/common/thrift/CatalogService.thrift@57
PS2, Line 57: / True if this is a result of an INVALIDATE METADATA operation.
            :   4: required bool is_invalidate
Just curious, wouldn't we know this information from the ClientRequestState 
calling ProcessCatalogUpdateResult()? Wondering why the Catalog server has to 
send this back to the coordinator.


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
File fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java:

http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java@89
PS2, Line 89:    CatalogObjectVersionQueue.INSTANCE.removeVersion(
            :           existingRole.getCatalogVersion());
Wouldn't removeRole() -> roleCache_.remove(roleName) already do this?

Also curious, why this is being done for a very specific case?


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java
File fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java:

http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogObjectVersionQueue.java@30
PS2, Line 30:  * versions. Not thread-safe.
nit: I think it helps to define the use case of this class here. (like 
described in CatalogDeltaLog)


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@585
PS2, Line 585: =
nit: maybe make >= ? just to be on the safer side.


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@628
PS2, Line 628: if (LOG.isDebugEnabled()) {
             :           LOG.debug(String.format("Error calling toThrift() on 
table %s: %s",
             :               tbl.getFullName(), e.getMessage()), e);
             :         }
             :         return;
I think that is a valid LOG.error(), given we are missing an update of a 
version.


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1304
PS2, Line 1304: versionLock_.writeLock().lock();
qq: I see incrementAndGetCatalogVersion() already takes versionLock_. Whats the 
reason behind wrapping these calls (at multiple places) with this lock again? 
Is that maintain some additional consistency across the duration of the calls?


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1868
PS2, Line 1868:     LOG.info("Operation using SYNC_DDL is waiting for catalog 
topic version: " +
Wondering if we can make it easy someway to map this log to the corresponding 
DDL (supportability)


http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
File fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java:

http://gerrit.cloudera.org:8080/#/c/8752/2/fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java@381
PS2, Line 381: CatalogObjectVersionQueue.INSTANCE.updateVersions(
             :             existingDb.getCatalogVersion(), catalogVersion);
Same as I commented in another place, wouldn't addDb() do that already? addDb() 
-> CatalogObjectCache ?



--
To view, visit http://gerrit.cloudera.org:8080/8752
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If12467a83acaeca6a127491d89291dedba91a35a
Gerrit-Change-Number: 8752
Gerrit-PatchSet: 2
Gerrit-Owner: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang <[email protected]>
Gerrit-Comment-Date: Sun, 17 Dec 2017 00:10:15 +0000
Gerrit-HasComments: Yes

Reply via email to