Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 18:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@187
PS17, Line 187: The second parameter is a set of
              :   // the partitions pointing to this location.
> we're paying memory for speed. what operations will be slowed down to addre
This has been discussed somewhere back in a previous comment. Basically after 
each insert we have to check that are there any partitions on this same 
location where we have to add internally the new file descriptors.
The same goes for dropping a partition: since the whole directory is dropped we 
have to get rid of the rest partitions on this location.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189
PS17, Line 189: locationToPartMap_
> can you quantify the additional memory needed per partition? perhaps measur
This has also been covered on this review. See my comment on Patch Set 3 around 
1st June.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235
PS17, Line 1235: Mapping
> use consistent naming. so far, there's: LocationMapping and PartMap
Picked LocationMapping
Done


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1492
PS17, Line 1492: received
> what does "received" mean?
received : got as parameter

The original comment doesn't reflect what the function does after the change. I 
wanted to emphasise that it's not just simply 'convert' the partition Names to 
HdfsPartitions and group them by location. But additionally it searches for the 
ones sharing their locations and include those to the result as well.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1494
PS17, Line 1494:   private HashMap<Path, List<HdfsPartition>> 
getPartitionsByPath(
> Is the modification to this method the one that fixes the issue with refres
We can populate such a map in that loop for sure but I feel that this already 
existing function getPartitionsByPath() is a more natural fit to do this 
mapping for us.
Anyway, we still need to keep the locationToPartMap_ as an HdfsTable member as 
we need that for altering and dropping partitions.
And we add no extra time complexity to getPartitionsByPath() with this change.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255
PS17, Line 2255: bl_hdfs.isL
> lift the cast. perhaps put it right after the precondition on L2239
Done


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2258
PS17, Line 2258:           Preconditions.checkState(partitionNameList.size() > 
1);
> at this point, size of partitionNameList must be > 1. perhaps add a precond
Done


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2262
PS17, Line 2262: ionNameList.subList(0,
> is there a solution in mind? for example, should we suggest that duplicate 
The solution depends on what the original intent of the user was: He he just 
wanted to drop the partition with keeping the data, then it should set this 
partition's location to something unique (don't have to be valid) and then drop 
this partition.
If the user wants to drop the data and all the partitions using that location, 
that it has to guarantee that all the partitions point to some not-necessarily 
valid location, and one points to the valid location meant to be dropped. Then 
it can delete the partitions one by one.
I'm not sure how to include this info to the error text briefly, though :)


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2464
PS17, Line 2464: fsPartiti
> make this a precondition for this method?
Won't work for the whole method, but can add that to the else branch.
Done


http://gerrit.cloudera.org:8080/#/c/10543/15/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/15/tests/metadata/test_partition_metadata.py@177
PS15, Line 177: pytest.skip()
> The problem with skipping though is that no one will remember to un-skip th
Still I feel that a Hive upgrade shouldn't break our builds.

Another thing is that we can't just assert on j=2 being present, as instead the 
last query would result in a "No such file or directory" error. The reason is 
that Impala thinks j=2 there, and it also sees some file descriptors cached so 
it tries to access them but they are not there on the disk. To be honest, I 
don't really want to assert on this error.



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 18
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 22 Jun 2018 14:33:46 +0000
Gerrit-HasComments: Yes

Reply via email to