Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10543 )
Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same location ...................................................................... Patch Set 18: (2 comments) http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1435 PS18, Line 1435: HdfsPartition partition: partitionMap_.values() This loop constructs partitionsToUpdateFileMdByPath for the case where partitions are not explicitly specified (among other things). That map is a subset of locationToPartMap_ (loosely speaking). So we're already looping over all partitions here. If the block on L1446 can be modified to also build up such a map for explicitly specified partitions (e.g., when partitionsToUpdate is != null), that map can be passed into getPartitionsByPath (L1484) to be used instead of the table-wide locationToPartMap_. Unless I've missed something, this looks like it adds little to current complexity for this code path and uses a small amount of additional memory when needed. I see no other use of getPartitionsByPath. http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2252 PS18, Line 2252: for (HdfsPartition part : parts) { if the path -> partition map is dropped, an additional loop would need to consider all partitions and check vs. the specified partitions for other partitions that share the same location. this can be sped up via small indexes (index the specified partitions by location). so we'd increase the cost here from scanning the specified partitions to scanning all partitions-- how much of a performance hit is this for 100k partitions and how performance critical is drop partitions? -- To view, visit http://gerrit.cloudera.org:8080/10543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a Gerrit-Change-Number: 10543 Gerrit-PatchSet: 18 Gerrit-Owner: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 22 Jun 2018 16:30:43 +0000 Gerrit-HasComments: Yes
