[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 23 Mar 2018 20:55:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2166/


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 23 Mar 2018 17:13:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java:

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@32
PS8, Line 32: SELECT(AccessConstants.SELECT, 1),
> Please ignore this. We should work on it in a separate JIRA.
Seems fine to separate, but we must absolutely get rid of all this redundancy.



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Mar 2018 22:35:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 10:

(3 comments)

You can safely ignore my comments. Let's try to get this out first and I can 
work on cleaning up the privilege model in a different patch.

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java:

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@32
PS8, Line 32: SELECT(AccessConstants.SELECT, 1),
> Fredy and I talked about this, and I've not done this right now. He's plann
Please ignore this. We should work on it in a separate JIRA.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@37
PS8, Line 37: INDEX(AccessConstants.INDEX, 32),
> Similarly, I'm not doing this quite yet.
Please ignore this. We should work on it in a separate JIRA.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@67
PS8, Line 67: return null;
> Again: I've not done this yet, to avoid doing too much in this change.
Please ignore this. We should work on it in a separate JIRA.



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 10
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Mar 2018 22:12:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Philip Zeyliger (Code Review)
Hello Thomas Tauber-Marshall, Fredy Wijaya, Tim Armstrong, Joe McDonnell, Alex 
Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9716

to look at the new patch set (#9).

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java:

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@32
PS8, Line 32: SELECT(AccessConstants.SELECT, 1),
> We probably should have use our own constants instead of relying on Sentry'
Fredy and I talked about this, and I've not done this right now. He's planning 
to re-visit this soon.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@37
PS8, Line 37: INDEX(AccessConstants.INDEX, 32),
> I prefer to only have Impala specific action types here. In other words, ev
Similarly, I'm not doing this quite yet.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@49
PS8, Line 49: private String name;
> nit: add final
Done


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@50
PS8, Line 50: private int code;
> nit: add final
Done


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@67
PS8, Line 67: return null;
> It's a bit weird to return null here although that's pretty much how it's i
Again: I've not done this yet, to avoid doing too much in this change.



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Mar 2018 22:07:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java:

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@32
PS8, Line 32: SELECT(AccessConstants.SELECT, 1),
We probably should have use our own constants instead of relying on Sentry's 
AccessConstants.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@37
PS8, Line 37: INDEX(AccessConstants.INDEX, 32),
I prefer to only have Impala specific action types here. In other words, 
everything else here minus INDEX and LOCK.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@49
PS8, Line 49: private String name;
nit: add final


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@50
PS8, Line 50: private int code;
nit: add final


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaActionFactory.java@67
PS8, Line 67: return null;
It's a bit weird to return null here although that's pretty much how it's 
implemented in HiveActionFactory. IMO, we should appropriately return the list 
of BitFieldAction.


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaPrivilegeModel.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaPrivilegeModel.java:

http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaPrivilegeModel.java@37
PS8, Line 37: return HivePrivilegeModel.getInstance().getImplyMethodMap();
I'm tempted that we should build our own getImplyMethod method instead of 
delegating it to HivePrivilegeModel. Thoughts?


http://gerrit.cloudera.org:8080/#/c/9716/8/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/ImpalaPrivilegeModel.java@44
PS8, Line 44:
nit: remove new line



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Mar 2018 21:30:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Philip Zeyliger (Code Review)
Hello Thomas Tauber-Marshall, Fredy Wijaya, Tim Armstrong, Joe McDonnell, Alex 
Behm, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9716

to look at the new patch set (#8).

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

In addition, we recently added an explicit "refresh" permission.  In
Sentry 2, this required creating an ImpalaPrivilegeModel to capture
that. It's a slight customization of Hive's equivalent class.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 7:

Alex & Fredy: could you take a look at the diff between PS 6 and PS 7? The 
introduction of "refresh" recently caused this change to break 
AuthorizationTest, and this is what I came up with to fix it. I'm not confident 
it matches your medium-term thinking, though it wasn't too painful.


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 22 Mar 2018 21:04:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-22 Thread Philip Zeyliger (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9716

to look at the new patch set (#7).

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 6: Code-Review+2

The difference between PS5 and PS6 is in bin/rat_exclude_files.txt. I moved 
Kudu's configuration into a common directory, and neglected to move the 
appropriate rat exclusion. It's genuinely trivial, so I'm carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Mar 2018 23:27:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Philip Zeyliger (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9716

to look at the new patch set (#6).

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 5: Code-Review+2

Seems good to me. I assume Thomas has also looked over it and is ok with it so 
it seems like we've had enough eyes on it.


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Mar 2018 22:43:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 5: Code-Review+1

This all makes sense to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Mar 2018 22:02:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 4:

PS4 is a small change to impala-config.sh. My testing so far suggests that you 
don't need to configure HADOOP_CLASSPATH explicitly as much as we've been 
doing, so I was able to simplify that away. (It also caused data loading 
problems, again, with the wrong log4j.properties being picked up.)


--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Mar 2018 17:38:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-21 Thread Philip Zeyliger (Code Review)
Hello Thomas Tauber-Marshall, Tim Armstrong, Joe McDonnell, Dan Hecht,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9716

to look at the new patch set (#4).

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

For AuthorizationTest, different hive versions show slightly different
things for extended output.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 

[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-20 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 2:

(5 comments)

Thanks for your review Tim!

I've taken your suggestions. I also found another test that needed amending: 
AuthorizationTest. I missed it because it was introduced recently 
("IMPALA-6479: Update DESCRIBE to respect column privileges") and I'd started 
this change before it was in the code base.

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java:

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java@73
PS2, Line 73:   new Object[] {policyFile, engine,  
HivePrivilegeModel.getInstance()});
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/main/java/org/apache/impala/util/SentryPolicyService.java
File fe/src/main/java/org/apache/impala/util/SentryPolicyService.java:

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/main/java/org/apache/impala/util/SentryPolicyService.java@398
PS2, Line 398:   return 
Lists.newArrayList(SentryUtil.listRoles(client.get(), 
requestingUser.getShortName()));
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/9716/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/9716/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@368
PS2, Line 368: set parquet_read_statistics=0;
> This isn't needed, right, since it's already set in Python.
Done


http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_mt_dop.py@103
PS2, Line 103: vector.get_value('exec_option')['parquet_read_statistics'] = 
'0'
> Comment here to explain why, e.g. "Disable min-max stats to prevent interfe
Done


http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_partitioning.py
File tests/query_test/test_partitioning.py:

http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_partitioning.py@68
PS2, Line 68: is_hive_2 = os.environ.get("IMPALA_MINICLUSTER_PROFILE", 
None) == "3"
> Can we move this to a helper function in tests/common/environ.py. It's some
Done



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Mar 2018 03:41:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9716 )

Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..


Patch Set 2: Code-Review+1

(5 comments)

I did a pass over it. I think most of this code has already has eyes on it so 
it all looks pretty reasonable (considering that it is a bunch of slightly 
gross hacks by necessity).

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java
File 
fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java:

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/compat-minicluster-profile-3/java/org/apache/impala/authorization/SentryAuthProvider.java@73
PS2, Line 73:   new Object[] {policyFile, engine,  
HivePrivilegeModel.getInstance()});
nit: extra space


http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/main/java/org/apache/impala/util/SentryPolicyService.java
File fe/src/main/java/org/apache/impala/util/SentryPolicyService.java:

http://gerrit.cloudera.org:8080/#/c/9716/2/fe/src/main/java/org/apache/impala/util/SentryPolicyService.java@398
PS2, Line 398:   return 
Lists.newArrayList(SentryUtil.listRoles(client.get(), 
requestingUser.getShortName()));
nit: long line


http://gerrit.cloudera.org:8080/#/c/9716/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test:

http://gerrit.cloudera.org:8080/#/c/9716/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@368
PS2, Line 368: set parquet_read_statistics=0;
This isn't needed, right, since it's already set in Python.


http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_mt_dop.py
File tests/query_test/test_mt_dop.py:

http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_mt_dop.py@103
PS2, Line 103: vector.get_value('exec_option')['parquet_read_statistics'] = 
'0'
Comment here to explain why, e.g. "Disable min-max stats to prevent 
interference with dictionary filtering."


http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_partitioning.py
File tests/query_test/test_partitioning.py:

http://gerrit.cloudera.org:8080/#/c/9716/2/tests/query_test/test_partitioning.py@68
PS2, Line 68: is_hive_2 = os.environ.get("IMPALA_MINICLUSTER_PROFILE", 
None) == "3"
Can we move this to a helper function in tests/common/environ.py. It's somewhat 
overkill when there are only two occurrences but it would be good to avoid this 
pattern spreading any further.



--
To view, visit http://gerrit.cloudera.org:8080/9716
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2ab50331986c7394c2bbfd6c865232bca975f7
Gerrit-Change-Number: 9716
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 20 Mar 2018 19:47:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4277: Support multiple versions of Hadoop ecosystem

2018-03-19 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9716


Change subject: IMPALA-4277: Support multiple versions of Hadoop ecosystem
..

IMPALA-4277: Support multiple versions of Hadoop ecosystem

Adds support for building against two sets of Hadoop ecosystem
components. The control variable is IMPALA_MINICLUSTER_PROFILE_OVERRIDE,
which can either be set to 2 (for Hadoop 2, Hive 1, and so on) or 3 (for
Hadoop 3, Hive 2, and so on).

We intend (in a trivial follow-on change soon) to make 3 the new default
and to explicitly deprecate 2, but this change only does not switch the
default yet. We support both to facilitate a smoother transition, but
support will be removed soon in the Impala 3.x line.

The switch is done at build time, following the pattern from IMPALA-5184
(build fe against both Hive 1 & 2 APIs). Switching back and forth
requires running 'cmake' again. Doing this at build-time avoids
complicating the Java code with classloader configuration.

There are relatively few incompatible APIs. This implementation
encapsulates that by extracting some Java code into
fe/src/compat-minicluminicluster-profile-{2,3}. (This follows the
pattern established by IMPALA-5184, but, to avoid a proliferation
of directories, I've moved the Hive files into the same tree.)
pattern from IMPALA-5184 (build fe against both Hive 1 & 2 APIs). I
consolidated the Hive changes into the same directory structure.

For Maven, I introduced Maven "profiles" to handle the two cases where
the dependencies (and exclusions) differ. These are driven by the
$IMPALA_MINICLUSTER_PROFILE environment variable.

For Sentry, exception class names changed. We work around this by adding
"isSentry...(Exception)" methods with two different implementations.
Sentry is also doing some odd shading, whereby some exceptions are
"sentry.org.apache.sentry..."; we handle both. Similarly, the mechanism
to create a SentryAuthProvider is slightly different. The easiest way to
see the differences is to run:

  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/util/SentryUtil.java
  diff -u 
fe/src/compat-minicluster-profile-{2,3}/java/org/apache/impala/authorization/SentryAuthProvider.java

The Sentry work is based on a change by Zach Amsden.

For Parquet, the difference is even more mechanical. The package names
gone from "parquet" to "org.apache.parquet". The affected code
was extracted into ParquetHelper, but only one copy exists. The second
copy is generated at build-time using sed.

In the rare cases where we need to behave differently at runtime,
MiniclusterProfile.MINICLUSTER_PROFILE is a class which encapsulates
what version we were built aginst. One of the cases is the results
expected by various frontend tests. I avoided the issue by translating
one error string into another, which handled the diversion in one place,
rather than complicating the several locations which look for "No
FileSystem for scheme..." errors.

The HBase APIs we use for splitting regions at test time changed.
This patch includes a re-write of that code for the new APIs. This
piece was contributed by Zach Amsden.

To work with newer versions of dependencies, I updated the version of
httpcomponents.core we use to 4.4.9.

We (Thomas Tauber-Marshall and I) uploaded new Hadoop/Hive/Sentry/HBase
binaries to s3://native-toolchain, and amended the shell scripts to
launch the right things. There are minor mechanical differences.  Some
of this was based on earlier work by Joe McDonnell and Zach Amsden.
Hive's logging is changed in Hive 2, necessitating creating a
log4j2.properties template and using it appropriately. Furthermore,
Hadoop3's new shell script re-writes do a certain amount of classpath
de-duplication, causing some issues with locating the relevant logging
configurations. Accomodations exist in the code to deal with that.

parquet-filtering.test was updated to turn off stats filtering. Older
Hive didn't write Parquet statistics, but newer Hive does. By turning
off stats filtering, we test what the test had intended to test.

For views-compatibility.test, it seems that Hive 2 has fixed certain
bugs that we were testing for in Hive. I've added a
HIVE=SUCCESS_PROFILE_3_ONLY mechanism to capture that.

To facilitate easier reviewing, the following files are 100% renames as 
identified by git; nothing
to see here.

 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetCatalogsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetColumnsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetFunctionsReq.java
 (100%)
 rename fe/src/{compat-hive-1 => 
compat-minicluster-profile-2}/java/org/apache/hive/service/rpc/thrift/TGetInfoReq.java
 (100%)
 rename