Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16273 )
Change subject: IMPALA-9990: Support SET OWNER for Kudu tables ...................................................................... Patch Set 1: (3 comments) Hi Andrew and Attila, I have replied your comments provided in the previous iteration and think it may be good to have a discussion about how Impala should implement SET OWNER for a Kudu table under different scenarios (Kudu-HMS integration enabled v.s. disabled). I also added Vihang as an additional reviewer since Vihang also worked on the functionality of creating Kudu tables in Impala and thus may be able to offer some insight into it. http://gerrit.cloudera.org:8080/#/c/16273/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/16273/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@551 PS1, Line 551: alterKuduTable(tbl, alterTableOptions, errMsg); > This may be more a question for Kudu, but in what versions of Kudu is nativ Thanks Andrew! I do not have concrete answers to both of your questions, i.e., 1) what the behavior should be when HMS synchronization is disabled v.s. enabled, and 2) what the behavior should be when the Kudu table is external v.s. internal from Impala's perspective. But I did carry out an investigation of how a DDL statement of a Kudu table is implemented in Impala. According to CatalogOpExecutor#alterTable(), if this DDL statement is for a Kudu table, it looks like Impala would not contact HMS to apply the metadata change, which can be seen at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L680. To be specific, after finishing CatalogOpExecutor#alterKuduTable(), CatalogOpExecutor#alterTable() returns immediately without going into the following switch block (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L682-L845). Recall that for a DDL statement of a Kudu table, CatalogOpExecutor#alterKuduTable() would call the corresponding method in KuduCatalogOpExecutor.java (https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L682-L845), which in turn would call KuduCatalogOpExecutor#alterKuduTable() at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L550-L561. Therefore, it is possible that the owner of a Kudu table from Kudu's perspective would be different than the owner from HMS's perspective if Kudu does not update HMS accordingly after the DDL statement. To verify this, I tried adding the following two statements before the return statement at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L680 and found that the owner of a Kudu table could indeed be different if Kudu does not update the HMS on the ownership change. org.apache.hadoop.hive.metastore.api.Table msTbl = tbl.getMetaStoreTable().deepCopy(); String oldOwner = msTbl.getOwner(); To address this issue, we need a way to determine whether Kudu and HMS are synchronized. I found there is a method CatalogOpExecutor#isKuduHmsIntegrationEnabled() in CatalogOpExecutor.java at https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1755-L1764, which could/may be used to determine whether the ownership information is synchronized between Kudu and HMS. Hence, based on this method, we could decide whether or not to update HMS on the ownership change of the Kudu table when Kudu and HMS is not synchronized. On the other hand, I have checked that using the current patch, alterSetOwner() could effectively alter the owner of the specified Kudu table no matter the Kudu table is external or internal. This could be manually verified by a revised Python program provided at https://github.com/apache/kudu/blob/master/examples/python/basic-python-example/basic_example.py. http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test: http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@51 PS1, Line 51: alter table simple set owner user non_owner > is it possible to check if the owner of "simple" is actually "non_owner"? I Thanks Attila! It looks currently there is no command in Impala that allows a user to check the owner of a given table. A possible way to verify the owner of a Kudu table is through the Kudu-Python client currently adopted in Impala's end-to-end tests. For example, kudu.client at https://github.com/apache/impala/blob/master/tests/query_test/test_kudu.py#L30. However, it seems that it is not straightforward to verify the owner using the Kudu-Python client within Impala's end-to-end tests due to the current Kudu-Python client adopted in Impala's end-to-end tests not allowing a user to access the field of 'owner' in a 'kudu.client.Table'. We would see the following error if we try to do so. E AttributeError: 'kudu.client.Table' object has no attribute 'owner' As a workaround, I compiled the Kudu-Python client at https://github.com/apache/kudu/tree/master/examples/python/basic-python-example myself and manually verified that the owner of a table could indeed be altered correctly by 'table.owner', where 'table' is computed as 'client.table('impala::db.kudu_tbl')' and "db" is the name of the database under which the Kudu table of "kudu_tbl" is stored. http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test File testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test: http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test@53 PS1, Line 53: 'Owner has been altere.' > nit: typo in altered. Is it normal that this test didn't fail? Thanks Attila! According to https://github.com/apache/impala/blob/master/tests/custom_cluster/test_kudu.py#L310-L312 and https://github.com/apache/impala/commit/245f2375e28ff7f5b0e89341d974341fa20444d1, the test of test_kudu_alter_table() in tests/custom_cluster/test_kudu.py is currently disabled and thus we won't see a failure due to this typo. I will remove this test case in the next iteration and will add it back when the issue reported at IMPALA-8751 is resolved. -- To view, visit http://gerrit.cloudera.org:8080/16273 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I29d641efc8db314964bc5ee9828a86d4a44ae95c Gerrit-Change-Number: 16273 Gerrit-PatchSet: 1 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]> Gerrit-Comment-Date: Tue, 18 Aug 2020 18:04:41 +0000 Gerrit-HasComments: Yes
