Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 17: (9 comments) http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@187 PS17, Line 187: The second parameter is a set of : // the partitions pointing to this location. we're paying memory for speed. what operations will be slowed down to address this "synonym" capability if this map is not used? http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189 PS17, Line 189: locationToPartMap_ can you quantify the additional memory needed per partition? perhaps measure the difference for a table with a large number of partitions (~100k). http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235 PS17, Line 1235: PartMap use consistent naming. so far, there's: LocationMapping and PartMap http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1492 PS17, Line 1492: received what does "received" mean? "grouped by their base dir..." implies same location, so I don't think the change here improves the comment. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1494 PS17, Line 1494: private HashMap<Path, List<HdfsPartition>> getPartitionsByPath( Is the modification to this method the one that fixes the issue with refreshes not applying to all synonyms of a location? If so, I see that its used right above on L1484. That method includes a loop over all partitions, so it seems that if we make a small map given the partitions to update, we can collect those additional partitions that share the same location as the ones that we're updating. That would seem to be a small, additional cost in both time and memory for this operation. http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255 PS17, Line 2255: (HdfsTable) lift the cast. perhaps put it right after the precondition on L2239 http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2258 PS17, Line 2258: // Display only a subset of the partition names not to flood the logs in at this point, size of partitionNameList must be > 1. perhaps add a precondition for this? http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2262 PS17, Line 2262: ImpalaRuntimeException is there a solution in mind? for example, should we suggest that duplicate references be updated to unique locations (do the locations need to be valid?), then the operation can proceed? http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2464 PS17, Line 2464: HdfsTable make this a precondition for this method? -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 17 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Wed, 20 Jun 2018 23:43:45 +0000 Gerrit-HasComments: Yes
