Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13251

to look at the new patch set (#3).

Change subject: IMPALA-8369 (part 4): Hive 3: fixes for functional dataset 
loading
......................................................................

IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading

This fixes three issues for functional dataset loading:

- works around HIVE-21675, a bug in which 'CREATE VIEW IF NOT EXISTS'
  does not function correctly in our current Hive build. This has been
  fixed already, but the workaround is pretty simple, and actually the
  'drop and recreate' pattern is used more widely for data-loading than
  the 'create if not exists' one.

- adds the ability to specify version restrictions for tables to load.
  The restrictions use the Python "requirements.txt" syntax. This new
  functionality is used to skip creating a hive "INDEX" table on Hive 3,
  where this functionality has been removed.

- Moving from MR to Tez execution changed the behavior of data loading
  by disabling the auto-merging of small files. With Hive-on-MR, this
  behavior defaulted to true, but with Hive-on-Tez it defaults false.
  The change is likely motivated by the fact that Tez automatically
  groups small splits on the _input_ side and thus is less likely to
  produce lots of small files. However, that grouping functionality
  doesn't work properly in localhost clusters (TEZ-3310) so we aren't
  seeing the benefit. So, this patch enables the post-process merging of
  small files.

  Prior to this change, the 'alltypesaggmultifilesnopart' test table was
  getting 40+ files inside it, which broke various planner tests. With
  the change, it gets the expected 4 files.

Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd
---
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/resources/hive-site.xml.py
M testdata/bin/generate-schema-statements.py
M testdata/bin/load-dependent-tables.sql
M testdata/datasets/README
M testdata/datasets/functional/functional_schema_template.sql
6 files changed, 144 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/51/13251/3
--
To view, visit http://gerrit.cloudera.org:8080/13251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd
Gerrit-Change-Number: 13251
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vihang Karajgaonkar <vih...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to