Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 13:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java:

PS13, Line 41: private DiskIdMapper() {
             :     }
> nit: single line
Done


PS13, Line 44: "storage ID UUID string"
> nit: why quoted?
Unquoted


Line 55:      * Returns a disk id (0-based) index for storageId on host 'host'.
> This function doesn't only return a disk id (implying this is a getter), it
Made the comment little detailed.


PS13, Line 68: .intValue()
> why do you need this? Won't this be unboxed automatically?
Done


PS13, Line 69: synchronized (storageIdGenerator) {
> The common path now includes two calls to storageUuidToDiskId.get(), one sy
Discussed in person. It was put outside the synchronized block so that callers 
with already mapped storage ids needn't enter the synchronized block.


PS13, Line 74: stoprageUuid
> typo: storageUuid
Done


http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS13, Line 315: if (!FileSystemUtil.isChildPath(partDir, dirPath)) continue;
> Is this check really needed? Haven't you done this check while populating p
Currently we don't do it at the callers. So I added the check here. partsByPath 
includes all the partitions that need to be updated. However this method does 
only for descendants of dirPath.


PS13, Line 316: perPartitionFileDescMap_
> If I recall correctly, I added this to speedup incremental loading of file 
Looking closely, yes this map doesn't make sense anymore. We are unnecessarily 
doing puts and gets into it. Removed.Nice catch btw.


PS13, Line 431:  
> nit: extra space
Done


PS13, Line 441: // Synthesize the block metadata for the file descriptor.
              :         long start = 0;
              :         long remaining = fd.getFileLength();
              :         // Workaround HADOOP-11584 by using the filesystem 
default block size rather than
              :         // the block size from the FileStatus.
              :         // TODO: after HADOOP-11584 is resolved, get the block 
size from the FileStatus.
              :         long blockSize = fs.getDefaultBlockSize();
              :         if (blockSize < MIN_SYNTHETIC_BLOCK_SIZE) blockSize = 
MIN_SYNTHETIC_BLOCK_SIZE;
              :         Preconditions.checkState(partitions.size() > 0);
              :         // For the purpose of synthesizing block metadata, we 
assume that all partitions
              :         // with the same location have the same file format.
              :         HdfsFileFormat fileFormat = 
partitions.get(0).getFileFormat();
              :         if 
(!fileFormat.isSplittable(HdfsCompression.fromFileName(fd.getFileName()))) {
              :           blockSize = remaining;
              :         }
              :         while (remaining > 0) {
              :           long len = Math.min(remaining, blockSize);
              :           List<BlockReplica> replicas = Lists.newArrayList(
              :               new 
BlockReplica(hostIndex_.getIndex(REMOTE_NETWORK_ADDRESS), false));
              :           fd.addFileBlock(new FileBlock(start, len, replicas));
              :           remaining -= len;
              :           start += len;
              :         }
> I would put this in a separate function.
Done


PS13, Line 474: numHdfsFiles_++;
              :           totalHdfsBytes_ += partition.getSize();
> Is this counting correct here?
We increment them for every fd (and its corresponding partition). We seem to be 
implementing reload() as drop() + add(). dropPartition() is subtracting them 
when required. Also loadPartitionFileMetadata() subtracts when we reload a 
subset of partitions. Do you think anything could be done better here? I agree 
its confusing since its done in multiple places.


http://gerrit.cloudera.org:8080/#/c/5148/13/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

PS13, Line 289: BlockLocation
> Is this the name of the API call? If not, can you use the exact function na
Rephrased it a little. Its no specific API call. Its just that only 
'DistributedFileSystem's actually set storageIds in the BlockLocations and 
others don't.


PS13, Line 418: child
> child or descendant?
Descendant. Updated the method name and callers.


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to