Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485
PS2, Line 1485:             locationToPartMap_.get(partition.getLocation());
> Thanks for breaking it down. I feel like we have it bad either way, which k
I checked how the partitionIds_ member is used: The only way this set is 
accessed is through the getPartitionIds() function that returns the whole set. 
(On a side note, this is used only from HdfsPartitionPruner)
With your suggestion every call of getPartitionIds() should be replaced with a 
loop in a loop that goes through this partitionMap_ and gets all the Ids. I'm 
not really comfortable of doing this, to be honest.

On the other hand I checked the current implementation of partitionMap_ and it 
says that it's a mapping of partition Ids vs HdfsPartitions. I wonder if the 
getPartitionIds() function can be changed to return partitionMap_.keySet() 
instead of partitionIds. If the answer is yes then we can get rid of the 
partitionIds_ member (after making sure that no-one modifies the keyset of the 
HashMap explicitly).
This tidies up some memory in HdfsTable. If we want to be more drastical then I 
think we should open a separate Jira to investigate the opportunities.

What do you think?


http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1167
PS3, Line 1167: for (int i = 0; i < partition.getPartitionValu
> Shouldn't the same thing be done here as you're doing in dropPartitions() i
Nothing extra has to be done here for the following reason:
CatalogServiceCatalog.dropPartitions() gathers all the partitions that should 
be dropped including the ones that shares it's location with the ones received 
as param.
Then this list is passed to HdfsTable.dropPartitions() that loops through this 
list and calls HdfsTable.getPartition() for each of the items/partitions. So no 
need to gather the partitions that share location here as well.



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Jun 2018 15:07:13 +0000
Gerrit-HasComments: Yes

Reply via email to