Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 17:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@187
PS17, Line 187: The second parameter is a set of
              :   // the partitions pointing to this location.
we're paying memory for speed. what operations will be slowed down to address 
this "synonym" capability if this map is not used?


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@189
PS17, Line 189: locationToPartMap_
can you quantify the additional memory needed per partition? perhaps measure 
the difference for a table with a large number of partitions (~100k).


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1235
PS17, Line 1235: PartMap
use consistent naming. so far, there's: LocationMapping and PartMap


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1492
PS17, Line 1492: received
what does "received" mean?

"grouped by their base dir..." implies same location, so I don't think the 
change here improves the comment.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1494
PS17, Line 1494:   private HashMap<Path, List<HdfsPartition>> 
getPartitionsByPath(
Is the modification to this method the one that fixes the issue with refreshes 
not applying to all synonyms of a location?
If so, I see that its used right above on L1484. That method includes a loop 
over all partitions, so it seems that if we make a small map given the 
partitions to update, we can collect those additional partitions that share the 
same location as the ones that we're updating. That would seem to be a small, 
additional cost in both time and memory for this operation.


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2255
PS17, Line 2255: (HdfsTable)
lift the cast. perhaps put it right after the precondition on L2239


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2258
PS17, Line 2258:           // Display only a subset of the partition names not 
to flood the logs in
at this point, size of partitionNameList must be > 1. perhaps add a 
precondition for this?


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2262
PS17, Line 2262: ImpalaRuntimeException
is there a solution in mind? for example, should we suggest that duplicate 
references be updated to unique locations (do the locations need to be valid?), 
then the operation can proceed?


http://gerrit.cloudera.org:8080/#/c/10543/17/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2464
PS17, Line 2464: HdfsTable
make this a precondition for this method?



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 17
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 20 Jun 2018 23:43:45 +0000
Gerrit-HasComments: Yes

Reply via email to