Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1220
PS10, Line 1220: locationToPartMap_.put(location, Sets.newHashSet(partition));
               :     } else {
               :       partitionSet.add(partition);
> locationToPartMap_.put(location, Sets.newHashSet(partition));
Done


http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159
PS1, Line 159:       assert data.split('\t')[1] == '6'
> Sorry, I misunderstood your comment. I was actually trying this out myself
Hmm. This is one feasible solution for sure and I think it's also easy to 
implement. In addition it would bring Impala's and Hive's behaviour in line 
with each other.
If we have consensus to go for this approach then I'm fine with it.

However, I feel that how Hive handles dropping these special partitions is 
incorrect and we should make another step forward and discuss with the Hive 
team to fix that as well. What I mean here is that once Hive drops a partition 
it's fine to drop the folder, but then it should also drop all the partitions 
sharing that location. A 'show partitions' shouldn't display partitions that 
are actually dropped (or at least their location is dropped).
I opened a Hive Jira to start a conversation about this topic because what they 
say is that this whole scenario is not supported in Hive and violates the 
concept of managed tables. So how it works now in Hive is not a result of 
conscious planning, rather a coincidence.
https://issues.apache.org/jira/browse/HIVE-19830
Feel free to join that conversation on the Jira (it's barely got started).

It's a good question what to do with this patch until that conversation comes 
to a conclusion. I feel that this patch leaves this special case of partitions 
safe to use and leaves one case that is incorrect (When someone drops partitons 
from Hive and the rest of the partitions on that particular folder are still 
present in Impala). So I'd vote to have submit this patch as it is and make 
further adjustments with a separate patch regarding the outcome of the 
conversation on the Hive Jira.

What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 11
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jun 2018 09:35:46 +0000
Gerrit-HasComments: Yes

Reply via email to