Hello Bharath Vissapragada, Zoltan Borok-Nagy, Sailesh Mukil,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/10543

to look at the new patch set (#10).

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................

IMPALA-6119: Fix issue with multiple partitions sharing same location

When multiple partitions point to the same location and a new
data file is added to any of them then the expected behaviour is that
this new file is added to the other partitions pointing to the same
location as well. Apparently, this is not the case and right after
the insertion the new file is only visible in the partition where it
was inserted to and an invalidate metadata is needed to resolve this
inconsistency.
This fix addresses this issue with keeping track of a mapping between
locations and the HdfsPartitions pointing to it. When new files are
inserted into a partition then all the other partition's metadata are
reloaded that point to the same location as the one where the files
are inserted.
The same issue is present when a partition is dropped and there are
one or more partitions that share the location of this dropped
partition. In this case the actual directory of the partition is
erased however, Catalog didn't remove the other partitions on this
location and showed them as existing ones. Again, an invalidate
metadata helped to make them disappear. This issue is also fixed.

Testing:
There was an existing test that covered partitions pointing to the
same location. However, after each insert it executed a refresh to
reload the metadata for the entire table. This reload was removed
to cover the changes of this fix.
Another test is introduced to cover the case when the location of a
partition is altered or a partition is removed.
One more test is created to cover when Impala reloads some of it's
partitions after Hive had dropped a partition that shares it's
location with other partitions.

Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/metadata/test_partition_metadata.py
4 files changed, 177 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/10543/10
--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 10
Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sail...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to