Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485 PS2, Line 1485: if (partitions == null) { > This is an interesting topic and I spent some time to decide which one is b Thanks for breaking it down. I feel like we have it bad either way, which kinda sucks. Skimming over the code a little more, I feel like there are some improvements we can definitely make, but they would be potentially risky changes. For eg: Do we need to keep 'paritionIds_' around when we already have 'partitionMap_' ? Could we make a decision to change 'partitionMap_' to look like: HashMap<String, HashSet<HdfsPartition>> partitionMap_; and always be reference it by location, and pick out the right partitionID from the resulting hash set? That would get rid of the need to hold a few 100 KB - 1.5 MB extra per table in memory. But that may not be realistic, since I'm not sure what information the consumers of HdfsTable APIs have. The truth here is that the general case wouldn't have most partitions pointing to the same location, but we're using a non-negligible amount of memory to get this edge case to work correctly. I'm okay with this if we *have* to fix this soon, but I feel like we should come back to thinking about restructuring some of the in memory structures in the catalog, or at least in HdfsTable. http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1167 PS3, Line 1167: removePartitionFromLocationMapping(partition); Shouldn't the same thing be done here as you're doing in dropPartitions() in CatalogServiceCatalog.java? i.e. shouldn't we drop all partitions that have the same location as this partition? Which is what seems like is happening in CatalogServiceCatalog::dropPartitions(). -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 3 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 01 Jun 2018 17:48:50 +0000 Gerrit-HasComments: Yes
