Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 11: (2 comments) http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1220 PS10, Line 1220: locationToPartMap_.put(location, Sets.newHashSet(partition)); : } else { : partitionSet.add(partition); > locationToPartMap_.put(location, Sets.newHashSet(partition)); Done http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159 PS1, Line 159: assert data.split('\t')[1] == '6' > Sorry, I misunderstood your comment. I was actually trying this out myself Hmm. This is one feasible solution for sure and I think it's also easy to implement. In addition it would bring Impala's and Hive's behaviour in line with each other. If we have consensus to go for this approach then I'm fine with it. However, I feel that how Hive handles dropping these special partitions is incorrect and we should make another step forward and discuss with the Hive team to fix that as well. What I mean here is that once Hive drops a partition it's fine to drop the folder, but then it should also drop all the partitions sharing that location. A 'show partitions' shouldn't display partitions that are actually dropped (or at least their location is dropped). I opened a Hive Jira to start a conversation about this topic because what they say is that this whole scenario is not supported in Hive and violates the concept of managed tables. So how it works now in Hive is not a result of conscious planning, rather a coincidence. https://issues.apache.org/jira/browse/HIVE-19830 Feel free to join that conversation on the Jira (it's barely got started). It's a good question what to do with this patch until that conversation comes to a conclusion. I feel that this patch leaves this special case of partitions safe to use and leaves one case that is incorrect (When someone drops partitons from Hive and the rest of the partitions on that particular folder are still present in Impala). So I'd vote to have submit this patch as it is and make further adjustments with a separate patch regarding the outcome of the conversation on the Hive Jira. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 11 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Thu, 14 Jun 2018 09:35:46 +0000 Gerrit-HasComments: Yes