Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 18: (10 comments) http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@187 PS17, Line 187: The second parameter is a set of : // the partitions pointing to this location. > we're paying memory for speed. what operations will be slowed down to addre This has been discussed somewhere back in a previous comment. Basically after each insert we have to check that are there any partitions on this same location where we have to add internally the new file descriptors. The same goes for dropping a partition: since the whole directory is dropped we have to get rid of the rest partitions on this location. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189 PS17, Line 189: locationToPartMap_ > can you quantify the additional memory needed per partition? perhaps measur This has also been covered on this review. See my comment on Patch Set 3 around 1st June. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235 PS17, Line 1235: Mapping > use consistent naming. so far, there's: LocationMapping and PartMap Picked LocationMapping Done http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1492 PS17, Line 1492: received > what does "received" mean? received : got as parameter The original comment doesn't reflect what the function does after the change. I wanted to emphasise that it's not just simply 'convert' the partition Names to HdfsPartitions and group them by location. But additionally it searches for the ones sharing their locations and include those to the result as well. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1494 PS17, Line 1494: private HashMap<Path, List<HdfsPartition>> getPartitionsByPath( > Is the modification to this method the one that fixes the issue with refres We can populate such a map in that loop for sure but I feel that this already existing function getPartitionsByPath() is a more natural fit to do this mapping for us. Anyway, we still need to keep the locationToPartMap_ as an HdfsTable member as we need that for altering and dropping partitions. And we add no extra time complexity to getPartitionsByPath() with this change. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255 PS17, Line 2255: bl_hdfs.isL > lift the cast. perhaps put it right after the precondition on L2239 Done http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2258 PS17, Line 2258: Preconditions.checkState(partitionNameList.size() > 1); > at this point, size of partitionNameList must be > 1. perhaps add a precond Done http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2262 PS17, Line 2262: ionNameList.subList(0, > is there a solution in mind? for example, should we suggest that duplicate The solution depends on what the original intent of the user was: He he just wanted to drop the partition with keeping the data, then it should set this partition's location to something unique (don't have to be valid) and then drop this partition. If the user wants to drop the data and all the partitions using that location, that it has to guarantee that all the partitions point to some not-necessarily valid location, and one points to the valid location meant to be dropped. Then it can delete the partitions one by one. I'm not sure how to include this info to the error text briefly, though :) http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2464 PS17, Line 2464: fsPartiti > make this a precondition for this method? Won't work for the whole method, but can add that to the else branch. Done http://gerrit.cloudera.org:8080/#/c/10543/15/tests/metadata/test_partition_metadata.py File tests/metadata/test_partition_metadata.py: http://gerrit.cloudera.org:8080/#/c/10543/15/tests/metadata/test_partition_metadata.py@177 PS15, Line 177: pytest.skip() > The problem with skipping though is that no one will remember to un-skip th Still I feel that a Hive upgrade shouldn't break our builds. Another thing is that we can't just assert on j=2 being present, as instead the last query would result in a "No such file or directory" error. The reason is that Impala thinks j=2 there, and it also sees some file descriptors cached so it tries to access them but they are not there on the disk. To be honest, I don't really want to assert on this error. -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 18 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 22 Jun 2018 14:33:46 +0000 Gerrit-HasComments: Yes
