Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 10: (3 comments) http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1220 PS10, Line 1220: partitionSet = Sets.newHashSet(); : partitionSet.add(partition); : locationToPartMap_.put(location, partitionSet); locationToPartMap_.put(location, Sets.newHashSet(partition)); http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159 PS1, Line 159: assert data.split('\t')[1] == '6' > I managed to check this in Hive: It also drops the folder of the partition Sorry, I misunderstood your comment. I was actually trying this out myself in Hive and I think I understand Hive's behavior now. Like you mentioned Hive just drops the directory irrespective of whether anything else points to it *but* still retains the other partitions and they remain in a dangling state. Querying them, we get no results back. I think Impala should also do the same thing. This can be done by calling for each (partition: locationToPartMap_(partitionTobedropped.location)) partition.setFileDescriptors(emptyList) IMO that behavior makes more sense and makes the code simpler too while keeping us consistent with Hive. Thoughts? http://gerrit.cloudera.org:8080/#/c/10543/10/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10543/10/tests/metadata/test_partition_metadata.py@175 PS10, Line 175: Then Hive : drops one of these partitions and as a result both are expected to be dropped. Like I mentioned in the other comment, this is not expected from a Hive POV AFAICT. Also, reading the comments from Sergey in https://issues.apache.org/jira/browse/HIVE-19830 gives the same impression. So, we should probably just drop the partition we were asked to drop and invalidate the filedescriptors of all the partitions that point to the location that just dropped/deleted. -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 10 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 14 Jun 2018 05:48:37 +0000 Gerrit-HasComments: Yes
