[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

Alex Behm (Code Review) Fri, 04 Nov 2016 14:49:22 -0700

Alex Behm has posted comments on this change.

Change subject: IMPALA-4172: Switch to BlockLocation methods for disk IDs
......................................................................



Patch Set 1:

(11 comments)

Some high-level comments before digging in deeper.

http://gerrit.cloudera.org:8080/#/c/4914/1//COMMIT_MSG
Commit Message:

Line 9: This change enables Impala to use BlockLocation#getStorageIds,
nit: we usually use () after names to denote they are e function/method

i.e. getStorageIds -> getStorageIds()

getFileBlockLocations -> getFileBlockLocations()


Line 12: getFileBlockLocations call which will be  deprecated in Hadoop-3.
extra space after "be"


http://gerrit.cloudera.org:8080/#/c/4914/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 287:       clazz = null;
remove


Line 290:     if ( clazz != null ) {
style: extra spaces, we do

if (clazz != null) {
...
}


Line 295:         m = null;
remove


Line 311:     LOG.info("SUPPORTS_STORAGE_ID is: " + (SUPPORTS_STORAGE_ID ? 
"true" : "false"));
no need for this ternary logic, just:

LOG.info("SUPPORTS_STORAGE_ID is: " + SUPPORTS_STORAGE_ID);


Line 352:   private static int getSequentialDiskId(String storageId)
While this solution is minimally invasive, I'm not a big fan of the extra 
locking and memory consumption of this map.

An overall better design is to convert the UUID to a 128-bit int and use that 
everywhere. Also see my comments below.

Let me know what you think.


Line 353:   {
move to previous line


Line 379:   private static int getDiskId(String storageId) {
As much as I'd like to use this simpler alternative, I think it could lead to 
serious performance problems that could be very hard to debug. I don't think we 
can use this.

Perhaps we should consider switching to a 128-bit integer to represent the UUID 
everywhere (including all thrift structures and the BE). The BE only needs a 
number that it can mod against the number of local data dirs (see 
DiskIoMgr::AssignQueue()).

Using a 128-bit int would also work with the old volume id API.

We have an existing TUniqueId which may be suitable.


Line 437:       BlockLocation[] locations = fs.getFileBlockLocations(file, 0, 
file.getLen());
still using the old API?

My understanding is that switching to a new API for these calls here will make 
the extra loadDiskIds() unnecessary.


Line 526:       if (SUPPORTS_STORAGE_ID) {
This seems fine as a first cut, but we should consider taking this a step 
further in a follow-on change.

This is still grabbing the storage ids partition-by-partition, but ideally we 
could get all the blocks and their locations for an entire table. We'd need to 
handle partitions with non-standard locations specially.


-- 
To view, visit http://gerrit.cloudera.org:8080/4914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibbff94cef9a9db7b3945f8e7b0286866d2cc3b61
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Laszlo Gaal <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Laszlo Gaal <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4172: Switch to BlockLocation methods for disk IDs

Reply via email to