Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 13:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS13, Line 41: private DiskIdMapper() {
             :     }
nit: single line


PS13, Line 44: "storage ID UUID string"
nit: why quoted?


Line 55:      * Returns a disk id (0-based) index for storageId on host 'host'.
This function doesn't only return a disk id (implying this is a getter), it 
also generates and adds a disk id for a non-existent <host, storageUuid> pair. 
That needs to be documented.


PS13, Line 68: .intValue()
why do you need this? Won't this be unboxed automatically?


PS13, Line 69: synchronized (storageIdGenerator) {
The common path now includes two calls to storageUuidToDiskId.get(), one 
synchronized block. Won't it be better if you make this function synchronized 
and remove the test and test and set logic? In this case you only need one call 
to storageUuidToDiskId.get(), this doesn't need to be a concurrent hash map and 
you still have 1 synchronized block with the same number of operations in it.


PS13, Line 74: stoprageUuid
typo: storageUuid


http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS13, Line 315: if (!FileSystemUtil.isChildPath(partDir, dirPath)) continue;
Is this check really needed? Haven't you done this check while populating 
partsByPath?


PS13, Line 316: perPartitionFileDescMap_
If I recall correctly, I added this to speedup incremental loading of file 
descriptors for files that already existed in the metadata and for which their 
blocks hadn't changed. My understanding is that this code is no longer here 
(i.e. we already reload the file/block metadata from scratch). So, is this map 
still needed?


PS13, Line 431:  
nit: extra space


PS13, Line 441: // Synthesize the block metadata for the file descriptor.
              :         long start = 0;
              :         long remaining = fd.getFileLength();
              :         // Workaround HADOOP-11584 by using the filesystem 
default block size rather than
              :         // the block size from the FileStatus.
              :         // TODO: after HADOOP-11584 is resolved, get the block 
size from the FileStatus.
              :         long blockSize = fs.getDefaultBlockSize();
              :         if (blockSize < MIN_SYNTHETIC_BLOCK_SIZE) blockSize = 
MIN_SYNTHETIC_BLOCK_SIZE;
              :         Preconditions.checkState(partitions.size() > 0);
              :         // For the purpose of synthesizing block metadata, we 
assume that all partitions
              :         // with the same location have the same file format.
              :         HdfsFileFormat fileFormat = 
partitions.get(0).getFileFormat();
              :         if 
(!fileFormat.isSplittable(HdfsCompression.fromFileName(fd.getFileName()))) {
              :           blockSize = remaining;
              :         }
              :         while (remaining > 0) {
              :           long len = Math.min(remaining, blockSize);
              :           List<BlockReplica> replicas = Lists.newArrayList(
              :               new 
BlockReplica(hostIndex_.getIndex(REMOTE_NETWORK_ADDRESS), false));
              :           fd.addFileBlock(new FileBlock(start, len, replicas));
              :           remaining -= len;
              :           start += len;
              :         }
I would put this in a separate function.


PS13, Line 474: numHdfsFiles_++;
              :           totalHdfsBytes_ += partition.getSize();
Is this counting correct here?


http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

PS13, Line 289: BlockLocation
Is this the name of the API call? If not, can you use the exact function name?


PS13, Line 418: child
child or descendant?


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to