Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485 PS2, Line 1485: locationToPartMap_.get(partition.getLocation()); > Thanks for breaking it down. I feel like we have it bad either way, which k I checked how the partitionIds_ member is used: The only way this set is accessed is through the getPartitionIds() function that returns the whole set. (On a side note, this is used only from HdfsPartitionPruner) With your suggestion every call of getPartitionIds() should be replaced with a loop in a loop that goes through this partitionMap_ and gets all the Ids. I'm not really comfortable of doing this, to be honest. On the other hand I checked the current implementation of partitionMap_ and it says that it's a mapping of partition Ids vs HdfsPartitions. I wonder if the getPartitionIds() function can be changed to return partitionMap_.keySet() instead of partitionIds. If the answer is yes then we can get rid of the partitionIds_ member (after making sure that no-one modifies the keyset of the HashMap explicitly). This tidies up some memory in HdfsTable. If we want to be more drastical then I think we should open a separate Jira to investigate the opportunities. What do you think? http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1167 PS3, Line 1167: for (int i = 0; i < partition.getPartitionValu > Shouldn't the same thing be done here as you're doing in dropPartitions() i Nothing extra has to be done here for the following reason: CatalogServiceCatalog.dropPartitions() gathers all the partitions that should be dropped including the ones that shares it's location with the ones received as param. Then this list is passed to HdfsTable.dropPartitions() that loops through this list and calls HdfsTable.getPartition() for each of the items/partitions. So no need to gather the partitions that share location here as well. -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 5 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Mon, 04 Jun 2018 15:07:13 +0000 Gerrit-HasComments: Yes