Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1220
PS10, Line 1220: partitionSet = Sets.newHashSet();
               :       partitionSet.add(partition);
               :       locationToPartMap_.put(location, partitionSet);
locationToPartMap_.put(location, Sets.newHashSet(partition));


http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/1/tests/metadata/test_partition_metadata.py@159
PS1, Line 159:       assert data.split('\t')[1] == '6'
> I managed to check this in Hive: It also drops the folder of the partition
Sorry, I misunderstood your comment. I was actually trying this out myself in 
Hive and I think I understand Hive's behavior now.  Like you mentioned Hive 
just drops the directory irrespective of whether anything else points to it 
*but* still retains the other partitions and they remain in a dangling state. 
Querying them, we get no results back.
I think Impala should also do the same thing. This can be done by calling

for each (partition: locationToPartMap_(partitionTobedropped.location))
  partition.setFileDescriptors(emptyList)

IMO that behavior makes more sense and makes the code simpler too while keeping 
us consistent with Hive. Thoughts?


http://gerrit.cloudera.org:8080/#/c/10543/10/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/10/tests/metadata/test_partition_metadata.py@175
PS10, Line 175: Then Hive
              :     drops one of these partitions and as a result both are 
expected to be dropped.
Like I mentioned in the other comment, this is not expected from a Hive  POV 
AFAICT. Also, reading the comments from Sergey in 
https://issues.apache.org/jira/browse/HIVE-19830 gives the same impression.

So, we should probably just drop the partition we were asked to drop and 
invalidate the filedescriptors of all the partitions that point to the location 
that just dropped/deleted.



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 14 Jun 2018 05:48:37 +0000
Gerrit-HasComments: Yes

Reply via email to