Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485 PS2, Line 1485: if (partitions == null) { I'm not an expert on the Catalog code, so feel free to push back. I'm just wondering if it would make more sense to not have many extra 'locationToPartMap_' maps (one per HdfsTable) in memory, and rather just restructure the 'getPartitionsByPath()' function to loop through 'partitionMap_' once and pick out all the partitions that match 'partitionNames'. These are the questions that motivated the above approach: What's the largest we expect a 'partitionMap_' to be? And do we expect that such an approach would significantly slowdown the hot path in the catalog? http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1487 PS2, Line 1487: partitionPath.toString() partition.getLocation() to avoid unnecessary indirection ? -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 2 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 31 May 2018 17:01:47 +0000 Gerrit-HasComments: Yes
