Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15026 )
Change subject: IMPALA-9068: Use different directories for external vs managed warehouse ...................................................................... Patch Set 3: (5 comments) I have a couple jobs in progress, and I will post a new upload once those come back. http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/avro_schema_resolution/create_table.sql File testdata/avro_schema_resolution/create_table.sql: http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/avro_schema_resolution/create_table.sql@47 PS3, Line 47: LOCATION '/test-warehouse/avro_schema_resolution_test/'; > can we add something to hiveconf like 'warehouse.dir' that is either set to My thought on this is that this is one of the few places that tries to be generic by using hive.metastore.warehouse.dir (and it doesn't for most of the LOCATION statements). Almost everything else hard-codes /test-warehouse. So, I was thinking I would just remove the generic aspect. I can't find anything that relies on the generic location, and I'm pretty sure it doesn't work. http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/bin/load_nested.py File testdata/bin/load_nested.py: http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/bin/load_nested.py@92 PS3, Line 92: external.table.purge'='TRUE' > whats this for? Hive has a CTAS bug (HIVE-22371) where it puts data in the wrong directory (managed directory rather than external directory). Adding the "EXTERNAL" to the statement forces it to use the right directory. When dropping an external table, traditionally it does not remove the data. external.table.purge=TRUE forces it to drop the data. This is only a Hive 3 thing, and it results in the same behavior as a managed non-transactional table. Added a comment here about HIVE-22371. http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/cluster/node_templates/cdh7/etc/init.d/kms File testdata/cluster/node_templates/cdh7/etc/init.d/kms: http://gerrit.cloudera.org:8080/#/c/15026/3/testdata/cluster/node_templates/cdh7/etc/init.d/kms@1 PS3, Line 1: #!/bin/bash > is this the same as testdata/cluster/node_templates/cdh6/etc/init.d/kms ? c This is identical to the cdh6 version. Converted it to a symlink (and changed testdata/cluster/admin to handle symlinks). http://gerrit.cloudera.org:8080/#/c/15026/3/tests/comparison/cluster.py File tests/comparison/cluster.py: http://gerrit.cloudera.org:8080/#/c/15026/3/tests/comparison/cluster.py@519 PS3, Line 519: managed_warehouse_dir > is this used anywhere? No, I thought I'd need it, then I didn't. I can remove it. http://gerrit.cloudera.org:8080/#/c/15026/3/tests/custom_cluster/test_hive_parquet_codec_interop.py File tests/custom_cluster/test_hive_parquet_codec_interop.py: http://gerrit.cloudera.org:8080/#/c/15026/3/tests/custom_cluster/test_hive_parquet_codec_interop.py@87 PS3, Line 87: external > does it matter this change is being made for Hive 2 as well? This external is due to Hive CTAS bug that I mentioned before (HIVE-22371). I added a comment here and changed it so that only hive 3 does the external + external.table.purge=true. This matches what I do in testdata/bin/load_nested.py. -- To view, visit http://gerrit.cloudera.org:8080/15026 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3db69f1b8ca07ae98670429954f5f7a1a359eaec Gerrit-Change-Number: 15026 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Sahil Takiar <[email protected]> Gerrit-Comment-Date: Thu, 23 Jan 2020 01:37:32 +0000 Gerrit-HasComments: Yes
