Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15026 )

Change subject: IMPALA-9068: Use different directories for external vs managed 
warehouse
......................................................................


Patch Set 3:

(5 comments)

I have a couple jobs in progress, and I will post a new upload once those come 
back.

http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/avro_schema_resolution/create_table.sql
File testdata/avro_schema_resolution/create_table.sql:

http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/avro_schema_resolution/create_table.sql@47
PS3, Line 47: LOCATION '/test-warehouse/avro_schema_resolution_test/';
> can we add something to hiveconf like 'warehouse.dir' that is either set to
My thought on this is that this is one of the few places that tries to be 
generic by using hive.metastore.warehouse.dir (and it doesn't for most of the 
LOCATION statements). Almost everything else hard-codes /test-warehouse. So, I 
was thinking I would just remove the generic aspect.

I can't find anything that relies on the generic location, and I'm pretty sure 
it doesn't work.


http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/bin/load_nested.py
File testdata/bin/load_nested.py:

http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/bin/load_nested.py@92
PS3, Line 92: external.table.purge'='TRUE'
> whats this for?
Hive has a CTAS bug (HIVE-22371) where it puts data in the wrong directory 
(managed directory rather than external directory). Adding the "EXTERNAL" to 
the statement forces it to use the right directory.

When dropping an external table, traditionally it does not remove the data. 
external.table.purge=TRUE forces it to drop the data. This is only a Hive 3 
thing, and it results in the same behavior as a managed non-transactional table.

Added a comment here about HIVE-22371.


http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/cluster/node_templates/cdh7/etc/init.d/kms
File testdata/cluster/node_templates/cdh7/etc/init.d/kms:

http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/cluster/node_templates/cdh7/etc/init.d/kms@1
PS3, Line 1: #!/bin/bash
> is this the same as testdata/cluster/node_templates/cdh6/etc/init.d/kms ? c
This is identical to the cdh6 version. Converted it to a symlink (and changed 
testdata/cluster/admin to handle symlinks).


http://gerrit.cloudera.org:8080/#/c/15026/3/tests/comparison/cluster.py
File tests/comparison/cluster.py:

http://gerrit.cloudera.org:8080/#/c/15026/3/tests/comparison/cluster.py@519
PS3, Line 519: managed_warehouse_dir
> is this used anywhere?
No, I thought I'd need it, then I didn't. I can remove it.


http://gerrit.cloudera.org:8080/#/c/15026/3/tests/custom_cluster/test_hive_parquet_codec_interop.py
File tests/custom_cluster/test_hive_parquet_codec_interop.py:

http://gerrit.cloudera.org:8080/#/c/15026/3/tests/custom_cluster/test_hive_parquet_codec_interop.py@87
PS3, Line 87: external
> does it matter this change is being made for Hive 2 as well?
This external is due to Hive CTAS bug that I mentioned before (HIVE-22371).

I added a comment here and changed it so that only hive 3 does the external + 
external.table.purge=true. This matches what I do in 
testdata/bin/load_nested.py.



--
To view, visit http://gerrit.cloudera.org:8080/15026
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec
Gerrit-Change-Number: 15026
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Sahil Takiar <[email protected]>
Gerrit-Comment-Date: Thu, 23 Jan 2020 01:37:32 +0000
Gerrit-HasComments: Yes

Reply via email to