Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1485
PS2, Line 1485:       if (partitions == null) {
> This is an interesting topic and I spent some time to decide which one is b
Thanks for breaking it down. I feel like we have it bad either way, which kinda 
sucks. Skimming over the code a little more, I feel like there are some 
improvements we can definitely make, but they would be potentially risky 
changes.

For eg:
Do we need to keep 'paritionIds_' around when we already have 'partitionMap_' ?

Could we make a decision to change 'partitionMap_' to look like:
HashMap<String, HashSet<HdfsPartition>> partitionMap_;

and always be reference it by location, and pick out the right partitionID from 
the resulting hash set? That would get rid of the need to hold a few 100 KB - 
1.5 MB extra per table in memory. But that may not be realistic, since I'm not 
sure what information the consumers of HdfsTable APIs have.

The truth here is that the general case wouldn't have most partitions pointing 
to the same location, but we're using a non-negligible amount of memory to get 
this edge case to work correctly.

I'm okay with this if we *have* to fix this soon, but I feel like we should 
come back to thinking about restructuring some of the in memory structures in 
the catalog, or at least in HdfsTable.


http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/3/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1167
PS3, Line 1167: removePartitionFromLocationMapping(partition);
Shouldn't the same thing be done here as you're doing in dropPartitions() in 
CatalogServiceCatalog.java?

i.e. shouldn't we drop all partitions that have the same location as this 
partition? Which is what seems like is happening in 
CatalogServiceCatalog::dropPartitions().



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 3
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 01 Jun 2018 17:48:50 +0000
Gerrit-HasComments: Yes

Reply via email to