Alex Behm has posted comments on this change. Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs ......................................................................
Patch Set 1: (11 comments) Some high-level comments before digging in deeper. http://gerrit.cloudera.org:8080/#/c/4914/1//COMMIT_MSG Commit Message: Line 9: This change enables Impala to use BlockLocation#getStorageIds, nit: we usually use () after names to denote they are e function/method i.e. getStorageIds -> getStorageIds() getFileBlockLocations -> getFileBlockLocations() Line 12: getFileBlockLocations call which will be deprecated in Hadoop-3. extra space after "be" http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 287: clazz = null; remove Line 290: if ( clazz != null ) { style: extra spaces, we do if (clazz != null) { ... } Line 295: m = null; remove Line 311: LOG.info("SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? "true" : "false")); no need for this ternary logic, just: LOG.info("SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID); Line 352: private static int getSequentialDiskId(String storageId) While this solution is minimally invasive, I'm not a big fan of the extra locking and memory consumption of this map. An overall better design is to convert the UUID to a 128-bit int and use that everywhere. Also see my comments below. Let me know what you think. Line 353: { move to previous line Line 379: private static int getDiskId(String storageId) { As much as I'd like to use this simpler alternative, I think it could lead to serious performance problems that could be very hard to debug. I don't think we can use this. Perhaps we should consider switching to a 128-bit integer to represent the UUID everywhere (including all thrift structures and the BE). The BE only needs a number that it can mod against the number of local data dirs (see DiskIoMgr::AssignQueue()). Using a 128-bit int would also work with the old volume id API. We have an existing TUniqueId which may be suitable. Line 437: BlockLocation[] locations = fs.getFileBlockLocations(file, 0, file.getLen()); still using the old API? My understanding is that switching to a new API for these calls here will make the extra loadDiskIds() unnecessary. Line 526: if (SUPPORTS_STORAGE_ID) { This seems fine as a first cut, but we should consider taking this a step further in a follow-on change. This is still grabbing the storage ids partition-by-partition, but ideally we could get all the blocks and their locations for an entire table. We'd need to handle partitions with non-standard locations specially. -- To view, visit http://gerrit.cloudera.org:8080/4914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Laszlo Gaal <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Laszlo Gaal <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
