Fang-Yu Rao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16273 )

Change subject: IMPALA-9990: Support SET OWNER for Kudu tables
......................................................................


Patch Set 1:

(3 comments)

Hi Andrew and Attila, I have replied your comments provided in the previous 
iteration and think it may be good to have a discussion about how Impala should 
implement SET OWNER for a Kudu table under different scenarios (Kudu-HMS 
integration enabled v.s. disabled).

I also added Vihang as an additional reviewer since Vihang also worked on the 
functionality of creating Kudu tables in Impala and thus may be able to offer 
some insight into it.

http://gerrit.cloudera.org:8080/#/c/16273/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/16273/1/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java@551
PS1, Line 551:     alterKuduTable(tbl, alterTableOptions, errMsg);
> This may be more a question for Kudu, but in what versions of Kudu is nativ
Thanks Andrew!

I do not have concrete answers to both of your questions, i.e., 1) what the 
behavior should be when HMS synchronization is disabled v.s. enabled, and 2) 
what the behavior should be when the Kudu table is external v.s. internal from 
Impala's perspective. But I did carry out an investigation of how a DDL 
statement of a Kudu table is implemented in Impala.

According to CatalogOpExecutor#alterTable(), if this DDL statement is for a 
Kudu table, it looks like Impala would not contact HMS to apply the metadata 
change, which can be seen at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L680.
 To be specific, after finishing CatalogOpExecutor#alterKuduTable(), 
CatalogOpExecutor#alterTable() returns immediately without going into the 
following switch block 
(https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L682-L845).

Recall that for a DDL statement of a Kudu table,  
CatalogOpExecutor#alterKuduTable() would call the corresponding method in 
KuduCatalogOpExecutor.java 
(https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L682-L845),
 which in turn would call KuduCatalogOpExecutor#alterKuduTable() at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java#L550-L561.

Therefore, it is possible that the owner of a Kudu table from Kudu's 
perspective would be different than the owner from HMS's perspective if Kudu 
does not update HMS accordingly after the DDL statement. To verify this, I 
tried adding the following two statements before the return statement at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L680
 and found that the owner of a Kudu table could indeed be different if Kudu 
does not update the HMS on the ownership change.

org.apache.hadoop.hive.metastore.api.Table msTbl = 
tbl.getMetaStoreTable().deepCopy();
String oldOwner = msTbl.getOwner();

To address this issue, we need a way to determine whether Kudu and HMS are 
synchronized. I found there is a method 
CatalogOpExecutor#isKuduHmsIntegrationEnabled() in CatalogOpExecutor.java at 
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L1755-L1764,
 which could/may be used to determine whether the ownership information is 
synchronized between Kudu and HMS. Hence, based on this method, we could decide 
whether or not to update HMS on the ownership change of the Kudu table when 
Kudu and HMS is not synchronized.

On the other hand, I have checked that using the current patch, alterSetOwner() 
could effectively alter the owner of the specified Kudu table no matter the 
Kudu table is external or internal. This could be manually verified by a 
revised Python program provided at 
https://github.com/apache/kudu/blob/master/examples/python/basic-python-example/basic_example.py.


http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test:

http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_alter.test@51
PS1, Line 51: alter table simple set owner user non_owner
> is it possible to check if the owner of "simple" is actually "non_owner"? I
Thanks Attila!

It looks currently there is no command in Impala that allows a user to check 
the owner of a given table.

A possible way to verify the owner of a Kudu table is through the Kudu-Python 
client currently adopted in Impala's end-to-end tests. For example, kudu.client 
at 
https://github.com/apache/impala/blob/master/tests/query_test/test_kudu.py#L30.

However, it seems that it is not straightforward to verify the owner using the 
Kudu-Python client within Impala's end-to-end tests  due to the current 
Kudu-Python client adopted in Impala's end-to-end tests not allowing a user to 
access the field of 'owner' in a 'kudu.client.Table'. We would see the 
following error if we try to do so.

E   AttributeError: 'kudu.client.Table' object has no attribute 'owner'

As a workaround, I compiled the Kudu-Python client at 
https://github.com/apache/kudu/tree/master/examples/python/basic-python-example 
myself and manually verified that the owner of a table could indeed be altered 
correctly by 'table.owner', where 'table' is computed as 
'client.table('impala::db.kudu_tbl')' and "db" is the name of the database 
under which the Kudu table of "kudu_tbl" is stored.


http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test:

http://gerrit.cloudera.org:8080/#/c/16273/1/testdata/workloads/functional-query/queries/QueryTest/kudu_hms_alter.test@53
PS1, Line 53: 'Owner has been altere.'
> nit: typo in altered. Is it normal that this test didn't fail?
Thanks Attila!

According to
https://github.com/apache/impala/blob/master/tests/custom_cluster/test_kudu.py#L310-L312
 and 
https://github.com/apache/impala/commit/245f2375e28ff7f5b0e89341d974341fa20444d1,
 the test of test_kudu_alter_table() in tests/custom_cluster/test_kudu.py is 
currently disabled and thus we won't see a failure due to this typo. I will 
remove this test case in the next iteration and will add it back when the issue 
reported at IMPALA-8751 is resolved.



--
To view, visit http://gerrit.cloudera.org:8080/16273
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I29d641efc8db314964bc5ee9828a86d4a44ae95c
Gerrit-Change-Number: 16273
Gerrit-PatchSet: 1
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Vihang Karajgaonkar <[email protected]>
Gerrit-Comment-Date: Tue, 18 Aug 2020 18:04:41 +0000
Gerrit-HasComments: Yes

Reply via email to