Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10543 )

Change subject: IMPALA-6119: Fix issue with multiple partitions sharing same 
location
......................................................................


Patch Set 20:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1435
PS18, Line 1435: / Only reload file metadata of partitions speci
> This loop constructs partitionsToUpdateFileMdByPath for the case where part
In my opinion the insert path is the performance critical rather than the 
delete partition path. So, it might make sense to get rid of locationToPartMap_ 
then and use this loop to get all the partitions that share its location with 
the input partitions.

In turn the drop partition path would have some increased time complexity: <num 
of partitions to delete> * <num of total partitions>. Or I could so something 
like here with my latest change: have a set of locations related to the input 
partitions and then the complexity reduces to <num of total partitions>.


http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10543/18/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2252
PS18, Line 2252: opOptions.purgeData(purge);
> if the path -> partition map is dropped, an additional loop would need to c
see my comments on your other question.


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py
File tests/metadata/test_partition_metadata.py:

http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@124
PS18, Line 124:     # Create the table
              :     self.execute_query_expect_success(
              :         self.client,
> This looks wrong, because the directories made here will only be "same_loc_
Good point. No need to create these directories.


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@127
PS18, Line 127: table %
> Here and elsewhere, there's a execute_query_expect_success() method you sho
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@163
PS18, Line 163: chec
> Are we (Impala) in control of this message? It should be "its".
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@168
PS18, Line 168:     # Drop partition j=2 and expect this operation to fail as 
the partition's location is
> If we are here, then we never raised an exception. It would be polite to sh
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@177
PS18, Line 177:   assert expe
> You can insert a reason here. It's mildly helpful to someone that happens t
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@184
PS18, Line 184:     drops one of these partitions and as a result both are 
expected to be dropped. This
              :     test checks that both of the mentioned partitions are no 
longer
> As before, I can't tell that this is needed.
Apparently, this is not needed either.


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@187
PS18, Line 187: A-7152
> As before, execute_query_expect_success() in places where you expect succes
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@191
PS18, Line 191:     TBL_NAME = "same_loc_test"
              :     FQ_TBL_NAME = unique_database + "." + TBL_NAME
> Same comment for L185 applies here.
Done


http://gerrit.cloudera.org:8080/#/c/10543/18/tests/metadata/test_partition_metadata.py@202
PS18, Line 202:     self.execute_query_expect_success(
              :         self.client, "alter table
> You can use self.hive_client() instead.
Done



--
To view, visit http://gerrit.cloudera.org:8080/10543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a54bc8224bcefe65b83de2df58bb84629f2aa4a
Gerrit-Change-Number: 10543
Gerrit-PatchSet: 20
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Michael Brown <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 26 Jun 2018 18:02:46 +0000
Gerrit-HasComments: Yes

Reply via email to