Alex Behm has posted comments on this change.

Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading
......................................................................


Patch Set 4:

(6 comments)

Responding to comments. I'll wait for some of the larger changes before doing 
another round.

http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 351:       Map<String, Map<String, FileDescriptor>> newFileDescMap = 
Maps.newHashMap();
> I did that initially but the problem happens when one of the files is delet
Before, we used to want to reuse as much of the existing block metadata as 
possible since that was an expensive operation. However, with this new design, 
we are already loading all the metadata from HDFS, so there is no need to 
optimize this case by not reloading the metadata (we are loading it anyway). So 
I think this can be simplified to just discarding the entries from the global 
map and updating the global map in place. We also don't need any of the "should 
we reload the metadata" logic.


Line 386:             String[] blockHostPorts = loc.getNames();
> We are using them in preconditions check in L396 and loop condition in L403
The code and comments around these is very sparse on content, so I think we 
should just condense it. This method looks really long but really it is just 
very "fluffy", so I was hoping to make it more content dense.


Line 431:           numHdfsFiles_ += fileDescs.size();
> I think its due to the current design. The reload is implemented as dropPar
Got it, makes sense.


Line 483:         for (HdfsPartition partition: partitions) {
> Although it looks like O(N^2), it isn't. 'partitions' correspond to each pa
Ahh, of course. You are right. I missed the local partitions var.


Line 934:     boolean isMarkedCached = isMarkedCached_;
> Yep, wanted to discuss about this. I'll ping you offline.
Looks like we used to force reloading the block locations of cached partitions. 
As discussed in a comment above, the optimization of only reloading the 
metadata for some files doesn't seem to make sense anymore, so forcing the 
reload based on caching doesn't make anymore, either.


http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

Line 423:     if (p.isRoot()) return false;
> That returns true for isChildPath(root, root)?
Good point. You are right.


-- 
To view, visit http://gerrit.cloudera.org:8080/5148
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to