Hello Thomas Tauber-Marshall, Tim Armstrong, Joe McDonnell, Dan Hecht, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9716 to look at the new patch set (#7). Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem ...................................................................... IMPALA-4277: Support multiple versions of Hadoop ecosystem Adds support for building against two sets of Hadoop ecosystem components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE, which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for Hadoop 3, Hive 2, and so on). We intend (in a trivial follow-on change soon) to make 3 the new default and to explicitly deprecate 2, but this change only does not switch the default yet. We support both to facilitate a smoother transition, but support will be removed soon in the Impala 3.x line. The switch is done at build time, following the pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). Switching back and forth requires running 'cmake' again. Doing this at build-time avoids complicating the Java code with classloader configuration. There are relatively few incompatible APIs. This implementation encapsulates that by extracting some Java code into fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the pattern established by IMPALA-5184, but, to avoid a proliferation of directories, I've moved the Hive files into the same tree.) pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I consolidated the Hive changes into the same directory structure. For Maven, I introduced Maven "profiles" to handle the two cases where the dependencies (and exclusions) differ. These are driven by the $IMPALA_MINICLUSTER_PROFILE environment variable. For Sentry, exception class names changed. We work around this by adding "isSentry...(Exception)" methods with two different implementations. Sentry is also doing some odd shading, whereby some exceptions are "sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism to create a SentryAuthProvider is slightly different. The easiest way to see the differences is to run: diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java diff -u fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java The Sentry work is based on a change by Zach Amsden. For Parquet, the difference is even more mechanical. The package names gone from "parquet" to "org.apache.parquet". The affected code was extracted into ParquetHelper, but only one copy exists. The second copy is generated at build-time using sed. In the rare cases where we need to behave differently at runtime, MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates what version we were built aginst. One of the cases is the results expected by various frontend tests. I avoided the issue by translating one error string into another, which handled the diversion in one place, rather than complicating the several locations which look for "No FileSystem for scheme..." errors. The HBase APIs we use for splitting regions at test time changed. This patch includes a re-write of that code for the new APIs. This piece was contributed by Zach Amsden. To work with newer versions of dependencies, I updated the version of httpcomponents.core we use to 4.4.9. We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase binaries to s3://native-toolchain, and amended the shell scripts to launch the right things. There are minor mechanical differences. Some of this was based on earlier work by Joe McDonnell and Zach Amsden. Hive's logging is changed in Hive 2, necessitating creating a log4j2.properties template and using it appropriately. Furthermore, Hadoop3's new shell script re-writes do a certain amount of classpath de-duplication, causing some issues with locating the relevant logging configurations. Accomodations exist in the code to deal with that. parquet-filtering.test was updated to turn off stats filtering. Older Hive didn't write Parquet statistics, but newer Hive does. By turning off stats filtering, we test what the test had intended to test. For views-compatibility.test, it seems that Hive 2 has fixed certain bugs that we were testing for in Hive. I've added a HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that. For AuthorizationTest, different hive versions show slightly different things for extended output. To facilitate easier reviewing, the following files are 100% renames as identified by git; nothing to see here. rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java (100%) rename fe/src/{compat-hive-1 => compat-minicluster-profile-2}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename fe/src/{compat-hive-2 => compat-minicluster-profile-3}/java/org/apache/impala/compat/MetastoreShim.java (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-acls.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/kms-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/hadoop/conf/yarn-site.xml.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-common (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-master (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/init.d/kudu-tserver (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/master.conf.tmpl (100%) rename testdata/cluster/node_templates/{cdh5 => common}/etc/kudu/tserver.conf.tmpl (100%) CreateTableLikeFileStmt had a chunk of code moved to ParquetHelper.java. This was done manually, but without changing anything except what Java required in terms of accessibility and boilerplate. rewrite fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java (80%) copy fe/src/{main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java => compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java} (77%) Testing: Ran core & exhaustive tests with both profiles. Cherry-picks: not for 2.x. Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 --- M bin/create-test-configuration.sh M bin/impala-config.sh M bin/rat_exclude_files.txt M fe/pom.xml R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetSchemasReq.java R fe/src/compat-minicluster-profile-2/java/org/apache/hive/service/rpc/thrift/TGetTablesReq.java A fe/src/compat-minicluster-profile-2/java/org/apache/impala/authorization/SentryAuthProvider.java R fe/src/compat-minicluster-profile-2/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-minicluster-profile-2/java/org/apache/impala/compat/MiniclusterProfile.java A fe/src/compat-minicluster-profile-2/java/org/apache/impala/util/SentryUtil.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/analysis/ParquetHelper.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaPrivilegeModel.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java R fe/src/compat-minicluster-profile-3/java/org/apache/impala/compat/MetastoreShim.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/compat/MiniclusterProfile.java A fe/src/compat-minicluster-profile-3/java/org/apache/impala/util/SentryUtil.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java M fe/src/main/java/org/apache/impala/util/SentryPolicyService.java M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java A fe/src/test/resources/hive-log4j2.properties.template M impala-parent/pom.xml M testdata/bin/run-hbase.sh M testdata/bin/run-hive-server.sh D testdata/cluster/node_templates/cdh5/etc/hadoop/conf/yarn-site.xml.tmpl A testdata/cluster/node_templates/cdh6/etc/init.d/kms R testdata/cluster/node_templates/common/etc/hadoop/conf/kms-acls.xml.tmpl R testdata/cluster/node_templates/common/etc/hadoop/conf/kms-site.xml.tmpl M testdata/cluster/node_templates/common/etc/hadoop/conf/yarn-site.xml.tmpl M testdata/cluster/node_templates/common/etc/init.d/common.tmpl R testdata/cluster/node_templates/common/etc/init.d/kudu-common R testdata/cluster/node_templates/common/etc/init.d/kudu-master R testdata/cluster/node_templates/common/etc/init.d/kudu-tserver M testdata/cluster/node_templates/common/etc/init.d/yarn-common R testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl R testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl M testdata/pom.xml R testdata/src/compat-minicluster-profile-2/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java C testdata/src/compat-minicluster-profile-3/java/org/apache/impala/datagenerator/HBaseTestDataRegionAssigment.java M testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test M testdata/workloads/functional-query/queries/QueryTest/views-compatibility.test M tests/common/environ.py M tests/common/impala_test_suite.py M tests/metadata/test_views_compatibility.py M tests/query_test/test_mt_dop.py M tests/query_test/test_partitioning.py 54 files changed, 1,620 insertions(+), 753 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/16/9716/7 -- To view, visit http://gerrit.cloudera.org:8080/9716 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7 Gerrit-Change-Number: 9716 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>