Alex Behm has posted comments on this change. Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ......................................................................
Patch Set 4: (6 comments) Responding to comments. I'll wait for some of the larger changes before doing another round. http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 351: Map<String, Map<String, FileDescriptor>> newFileDescMap = Maps.newHashMap(); > I did that initially but the problem happens when one of the files is delet Before, we used to want to reuse as much of the existing block metadata as possible since that was an expensive operation. However, with this new design, we are already loading all the metadata from HDFS, so there is no need to optimize this case by not reloading the metadata (we are loading it anyway). So I think this can be simplified to just discarding the entries from the global map and updating the global map in place. We also don't need any of the "should we reload the metadata" logic. Line 386: String[] blockHostPorts = loc.getNames(); > We are using them in preconditions check in L396 and loop condition in L403 The code and comments around these is very sparse on content, so I think we should just condense it. This method looks really long but really it is just very "fluffy", so I was hoping to make it more content dense. Line 431: numHdfsFiles_ += fileDescs.size(); > I think its due to the current design. The reload is implemented as dropPar Got it, makes sense. Line 483: for (HdfsPartition partition: partitions) { > Although it looks like O(N^2), it isn't. 'partitions' correspond to each pa Ahh, of course. You are right. I missed the local partitions var. Line 934: boolean isMarkedCached = isMarkedCached_; > Yep, wanted to discuss about this. I'll ping you offline. Looks like we used to force reloading the block locations of cached partitions. As discussed in a comment above, the optimization of only reloading the metadata for some files doesn't seem to make sense anymore, so forcing the reload based on caching doesn't make anymore, either. http://gerrit.cloudera.org:8080/#/c/5148/4/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 423: if (p.isRoot()) return false; > That returns true for isChildPath(root, root)? Good point. You are right. -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
