Bharath Vissapragada has posted comments on this change. Change subject: IMPALA-4172/IMPALA-3653: Improvements to block metadata loading ......................................................................
Patch Set 2: (16 comments) http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java File fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java: Line 27: * This class encapsulates the logic required for mapping HDFS > Much fluff, simplify to: Maps HDFS storage-UUIDs to per-host 0-based, seque Done Line 29: * by the backend. > required by Impala backends Done Line 31: public class DiskIdMapper { > Make this a singleton to clarify the intended use. Done Line 36: // This is intentionally implemented as a static object to share the storageIDs across > move this part to the class comment Done Line 38: // consistent across all the tables, without which, we can potentially confuse the > also mention that the static solution requires the least amount of memory Done Line 48: private static ConcurrentHashMap<String, AtomicInteger> > Don't think we need a ConcurrentHashMap or AtomicInteger here, because when Used a stricter locking scheme, but as we discussed offline, this can still have contention in the initial metadata load. Added a TODO for that. http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: Line 294: // Keeps track of all the parition paths that are filtered to add block metadata. > this comment style for classes and functions Oops, done. Line 368: if (!pathMd.filteredParitionPaths.contains(fileStatus.getPath())) continue; > Looks like this contains() and the indexOf() below can become very expensiv The new PS design revamps the whole thing. We don't do indexOf() searches anymore. Line 423: for (int i=0; i < fileMd.pathFds.size(); ++i) { > i = 0 (add spaces) Done Line 458: // Only build the storage IDs fs instances that support the BlockLocation API. > ... for fs instances ... Done Line 720: private void loadAllPartitions( > After reading through this a few times I'm still confused about the high-le The new PS implements a similar logic. We can sync-up on that if required. Line 777: if (dirsToLoad.contains(partDir) || > positive is more readable to me Done http://gerrit.cloudera.org:8080/#/c/5148/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java: Line 275: * Returns true if the filesystem supports BlockLocation API. > What API is it specifically? LocatedFileStatus.getBlockLocations()? Yes, all methods returning 'BlockLocation's. Line 277: public static boolean supportsBlockLocationAPI(FileSystem fs) { > supportsBlockLocationApi Done Line 406: public static boolean isPathChild(Path p1, Path p2) { > isChildPath(Path p, Path parent) Done Line 408: Path parentDir = p1.getParent(); > I think this can be improved by using the Path.getDepth(). You can do getPa good idea. Done. -- To view, visit http://gerrit.cloudera.org:8080/5148 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie127658172e6e70dae441374530674a4ac9d5d26 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
