[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671

2019-05-01 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13213


Change subject: Bump CDP_BUILD_NUMBER to 1056671
..

Bump CDP_BUILD_NUMBER to 1056671

This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 7 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/2
--
To view, visit http://gerrit.cloudera.org:8080/13213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Gerrit-Change-Number: 13213
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#11). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. This will be taken up
in subsequent change.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586
3. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
4. The Sentry object ownership tests are flaky. There are
race-conditions in that code which get exposed for some by this patch.
For example, when a database is created, the object privileges are
immediately updated in the catalog cache. If Sentry has not been updated
and there is a refresh authorization call in between it can clear the
privilege from the cache causing the subsequent statement using that
privilege to fail. The patch adds sleep statements in the test to
work-around these issues.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShims.java
R fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShim.java
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveShim.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
34 files changed, 1,870 insertions(+), 449 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/12
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 12:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1073
PS11, Line 1073: .getAddPartitionMessage(event.getMessage());
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/13005/11/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1169
PS11, Line 1169:   try {
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/13005/11/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/13005/11/testdata/bin/run-hive-server.sh@66
PS11, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-Xdebug 
-Xrunjdwp:transport=dt_socket,server=y,\
> line too long (121 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:03:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7973: Add support for fine grained events processing for partition level HMS events.

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13111 )

Change subject: IMPALA-7973: Add support for fine grained events processing for 
partition level HMS events.
..


Patch Set 3:

(5 comments)

Patch looks good to me. some nits below and its good to go from my side.

http://gerrit.cloudera.org:8080/#/c/13111/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13111/3//COMMIT_MSG@17
PS3, Line 17: affetced
spell-check


http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1081
PS3, Line 1081: org.apache.hadoop.hive.common.FileUtils
Can we import this instead of using fully qualified classname?


http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1153
PS3, Line 1153: infoLog
no need to have add_partitions in the log message since infoLog provides that 
information


http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1307
PS3, Line 1307: {} partitions dropped from table {} in alter_partition event
This statement doesn't make sense to me. Why would there be dropped partitions 
in alter_partition event? Also, the log doesn't need to have a event type in it 
since infoLog method takes care of it.


http://gerrit.cloudera.org:8080/#/c/13111/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1315
PS3, Line 1315: debugLog
no need to say "after a drop_partition event" here since the debugLog prints 
both the event id and event type



--
To view, visit http://gerrit.cloudera.org:8080/13111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5
Gerrit-Change-Number: 13111
Gerrit-PatchSet: 3
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 29 Apr 2019 17:31:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8419 : Validate event processing related configurations

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13019 )

Change subject: IMPALA-8419 : Validate event processing related configurations
..


Patch Set 13: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/13019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606
Gerrit-Change-Number: 13019
Gerrit-PatchSet: 13
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:49:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7973: Add support for fine grained events processing for partition level HMS events.

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13111 )

Change subject: IMPALA-7973: Add support for fine grained events processing for 
partition level HMS events.
..


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13111/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13111/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@56
PS4, Line 56: ClassUtil
remove if unused



--
To view, visit http://gerrit.cloudera.org:8080/13111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5
Gerrit-Change-Number: 13111
Gerrit-PatchSet: 4
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:52:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 12:

(2 comments)

Updated the patch with Joe's review comments.

http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh@a1
PS12, Line 1:
> Restore this line (RAT checks will fail)
Done


http://gerrit.cloudera.org:8080/#/c/13005/12/bin/impala-config.sh@513
PS12, Line 513: # The directory in which all the thirdparty CDP components live.
  : export 
CDP_COMPONENTS_HOME="$IMPALA_TOOLCHAIN/cdp_components-$CDP_BUILD_NUMBER"
> You move this up, so you can remove this.
Done



--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 29 Apr 2019 20:30:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#14). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/HiveShim.java
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/HiveShim.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
34 files changed, 1,885 insertions(+), 465 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/14
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 3:

(15 comments)

> (1 comment)
 >
 > Nice work.
 >
 > When I did the transitional work between Hive 1 and Hive 2, I
 > introduced a variable that switched at build time between the two
 > worlds. (See a203733fac3e1e37df8abeee39a88d187153a8c5 for the
 > revert and "git log --grep IMPALA_MINICLUSTER_PROFILE")
 >
 > If I'm understanding right, the approach here is to produce a
 > single "binary" that works for both worlds? Or at run time do the
 > "original" Hive jars get run? I think both approaches are
 > plausible; just want to make sure we're clear about it.
 >
 > (Is the shading slow? I've seen maven-shade-plugin be very slow...)

The original approach was to always build against hive-3, but we switched the 
approach to use a compatibility shim based approach as you suggested above.

Shading is not super slow. It task about 4 sec to build the jar if I remember 
correctly

http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@199
PS2, Line 199: # When USE_CDP_HIVE is set we use the latest hive version 
available to deply in minicluster
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@203
PS2, Line 203:   # TODO(Vihang) we should repackage the tarballs so that the 
src and binaries are extracted
> line too long (92 > 90)
This line was removed later in the patch


http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@212
PS2, Line 212:   export 
HIVE_HOME="$IMPALA_TOOLCHAIN/cdh_components-${CDH_BUILD_NUMBER}/hive-${MINICLUSTER_HIVE_VERSION}"
> line too long (106 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@230
PS2, Line 230: . "$IMPALA_HOME/bin/impala-config-branch.sh"
 : if [ -f "$IMPALA_HOME/bin/impala-config-local.sh" ]; then
 :   . "$IMPALA_HOME/bin/impala-config-local.sh"
 : fi
> We need to be careful about which variables are assigned before this and wh
Good point. Moved all the new variable assignment logic post this block.


http://gerrit.cloudera.org:8080/#/c/13005/2/bin/impala-config.sh@546
PS2, Line 546: export 
HIVE_METASTORE_THRIFT_DIR=$CDP_COMPONENTS_HOME/apache-hive-${IMPALA_HIVE_VERSION}-bin/src/standalone-metastore/src/main/thrift
> line too long (133 > 90)
This line was moved later in the patch and the line length is under the 90 now.


http://gerrit.cloudera.org:8080/#/c/13005/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/13005/2/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@105
PS2, Line 105:   public static String unescapeSQLString(String b) {
> Do we have any plans to create a public classes in Hive through which we ca
Yes, I think we may be able to do that add that. Will do it as a followup item.


http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
File fe/src/main/java/org/apache/impala/analysis/StringLiteral.java:

http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/analysis/StringLiteral.java@96
PS3, Line 96:   /**
:* Copied from Apache Hive's BaseSemanticAnalyzer. This method 
has not changed
:* since last several years so hoping that it is fairly stable 
by now. Sourcing it from
:* the Hive's code without copying brings along with it a lot 
of other unnecessary
:* dependencies
:* @param b
:* @return
:*/
> Can you move the parts copied from Hive to a separate file/directory? It wi
The copied code was moved into HiveShims class in the compat-3 directory


http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/13005/3/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@122
PS3, Line 122: //TODO (Vihang) this pulls in hive-hbase-handler dependency 
which brings all the
 : // other stuff we don't need. Perhaps we just need to copy 
the constants and mark
 : // them public API in Hive source code
 : private static final String HBASE_COLUMNS_MAPPING = 
"hbase.columns.mapping";
 : private static final String HBASE_TABLE_DEFAULT_STORAGE_TYPE 
= "hbase.table.default"
 : + ".storage.type";
 : private static final String HBASE_KEY_COL = ":key";
 : private static final String 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 12:

(11 comments)

Addressed review comments

http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG@58
PS9, Line 58:
> Is this flakiness specific to the hive 3 config? Or the races were already
I checked and confirmed that the notification listener is working is generating 
the events as expected but there may be more to it. I will dig more. Even if 
there is a some problem with the event data, these races are still present can 
show up when Sentry is slow for any reason to update its entries.


http://gerrit.cloudera.org:8080/#/c/13005/9//COMMIT_MSG@65
PS9, Line 65:
> maybe we shoudl disable these tests when running with hive 3 since we don't
So far, I would expect these tests work without any modifications on hive-2 
builds (results pending for the last job I triggered). When we turn on cdp for 
jobs we should re-investigate how to fix these tests.


http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@167
PS3, Line 167: export IMPALA_RANGER_VERSION=1.2.0.6.0.99.0-45
> This number looks like a CDH_BUILD_NUMBER, and is probably from the same na
I was not aware of this.. Most of the pending patches are merged into official 
builds so we may not need this anymore.


http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@199
PS3, Line 199:   # TODO(todd) switch to an official build.
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@203
PS3, Line 203:   # CDH hive version is used to build and deploy in minicluster 
when USE_CDP_HIVE is
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@212
PS3, Line 212: fi
> line too long (106 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/13005/9/bin/set-classpath.sh
File bin/set-classpath.sh:

http://gerrit.cloudera.org:8080/#/c/13005/9/bin/set-classpath.sh@30
PS9, Line 30: 
#"$IMPALA_HOME"/shaded-deps/target/impala-shaded-deps-0.1-SNAPSHOT.jar:\
> why is this necessary? shouldn't the shaded-deps dependency also end up in
You are right, this is not necessary. removed it.


http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
File fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java:

http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java@68
PS9, Line 68: throws MetaException {
:   return null;
: }
:   };
:
:   /**
> Are these changes actually used right now? I think this stuff ended up bein
Yeah. reverted the changes to this file in the latest patch


http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/13005/9/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@173
PS9, Line 173:   private static MetastoreEventsProcessor instance;
> Can you explain what's going on with this part of the change? Did Sentry mo
Sentry did not move to the JSONMessageFactory (and I believe this may be the 
reason why the test_ownership.py fails). However, in the shims approach we use 
the ExtendedJsonFactory for hive-2 builds and in hive-3 build we don't expect 
Sentry to be there. I will keep investigating while I debug cdp jobs.


http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh
File testdata/bin/run-hive-server.sh:

http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh@66
PS9, Line 66: export HIVE_METASTORE_HADOOP_OPTS="-Xdebug 
-Xrunjdwp:transport=dt_socket,server=y,\
> probably remove this
Done


http://gerrit.cloudera.org:8080/#/c/13005/9/testdata/bin/run-hive-server.sh@93
PS9, Line 93: if [ ${ONLY_METASTORE} -eq 0 ]; then
> this is in another patch- guess we can rebase this on top of that one to pi
Rebased my patch.



--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#16). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
31 files changed, 1,775 insertions(+), 455 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/16
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 16
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#18). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
31 files changed, 1,777 insertions(+), 457 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/18
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 18
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#19). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
31 files changed, 1,777 insertions(+), 457 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/19
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 19
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#20). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
31 files changed, 1,791 insertions(+), 456 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/20
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 20
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-30 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#21). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change adds a compatibility shim in fe so that Impala can
interoperate with Hive 3.1.0. It moves the existing Metastoreshim class
to a compat-hive-2 directory and adds a new Metastoreshim class under
compat-hive-3 directory. These shim classes implement method which are
different in hive-2 v/s hive-3 and are used by front end code. At the
build time, based on the environment variable
IMPALA_HIVE_MAJOR_VERSION one of the two shims is added to as source
using the fe/pom.xml build plugin.

Additionally, in order to reduce the dependencies footprint of Hive in
the front end code, this patch also introduces a new module called
shaded-deps. This module using shade plugin to include only the source
files from hive-exec which are need by the fe code. For hive-2 build
path, no changes are done with respect to hive dependencies to minimize
the risk of destabilizing the master branch on the default build option
of using Hive-2.

The different set of dependencies are activated using maven profiles.
The activation of each profile is automatic based on the
IMPALA_HIVE_MAJOR_VERSION.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. THis will be on-going
effort and test failures on Hive-3 will be fixed in additional
sub-tasks.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596. This hack will be removed when the
patches are available in official CDP Hive builds.
2. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
3. Sentry dependencies bring a lot of transitive hive dependencies. The
patch excludes such dependencies since they create problems while
building against Hive-3. Since these hive-2 dependencies are
already included when building against hive-2 this should not be a problem.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
A fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
A 
fe/src/compat-hive-3/java/org/apache/impala/compat/HiveMetadataFormatUtils.java
A fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
D fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
31 files changed, 1,813 insertions(+), 464 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/21
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 21
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-8369 (part 4): Hive 3: fixes for functional dataset loading

2019-05-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13251 )

Change subject: IMPALA-8369 (part 4): Hive 3: fixes for functional dataset 
loading
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13251/1/testdata/bin/load-dependent-tables.sql
File testdata/bin/load-dependent-tables.sql:

http://gerrit.cloudera.org:8080/#/c/13251/1/testdata/bin/load-dependent-tables.sql@a115
PS1, Line 115:
Some of the test rely on the fact that this table exists. Perhaps we should 
also ignore/modify such tests if we are running against hive-3.

Running git grep "hive_index_tbl" shows that this is used in
CatalogObjectToFromThriftTest, CatalogTest and FrontendTest



--
To view, visit http://gerrit.cloudera.org:8080/13251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic34930dc064da3136dde4e01a011d14db6a74ecd
Gerrit-Change-Number: 13251
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 07 May 2019 00:34:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes

2019-05-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13254


Change subject: IMPALA-8369 : Fix for tests failing with incompatible column 
changes
..

IMPALA-8369 : Fix for tests failing with incompatible column changes

In Hive-3 the configuration for allowing users to make incompatible
column type changes was disabled by default. In Hive-2 this was allowed.
Some of the tests like data_errors/test_data_errors.py and
metadata/test_compute_stats.py make changes to column types which are
disallowed by HMS-3 by default. This change adds a configuration option
in hive-site.xml to allow making incompatible changes to column types so
that we can run the existing tests with HMS-3.

Also, in HMS-3 there are certain new event types (OPEN_TXN, COMMIT_TXN,
etc) which may not have dbname set. This breaks the assumption in the
code in EventProcessor which expects dbName_ to be not null at all
times. This patch also makes changes in the EventProcessor so that such
Ignored events do not fail precondition checks during event processing.

Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/test/resources/hive-site.xml.py
2 files changed, 8 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/13254/2
--
To view, visit http://gerrit.cloudera.org:8080/13254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6
Gerrit-Change-Number: 13254
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] Hive 3: fix test permanent udfs.py for Hive 3 support

2019-05-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13236 )

Change subject: Hive 3: fix test_permanent_udfs.py for Hive 3 support
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/13236
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f50845c7d4769d8843cad87988498e165902169
Gerrit-Change-Number: 13236
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yongzhi Chen 
Gerrit-Comment-Date: Mon, 06 May 2019 19:55:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes

2019-05-07 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13254 )

Change subject: IMPALA-8369 : Fix for tests failing with incompatible column 
changes
..

IMPALA-8369 : Fix for tests failing with incompatible column changes

In Hive-3 the configuration for allowing users to make incompatible
column type changes was disabled by default. In Hive-2 this was allowed.
Some of the tests like data_errors/test_data_errors.py and
metadata/test_compute_stats.py make changes to column types which are
disallowed by HMS-3 by default. This change adds a configuration option
in hive-site.xml to allow making incompatible changes to column types so
that we can run the existing tests with HMS-3.

Also, in HMS-3 there are certain new event types (OPEN_TXN, COMMIT_TXN,
etc) which may not have dbname set. This breaks the assumption in the
code in EventProcessor which expects dbName_ to be not null at all
times. This patch also makes changes in the EventProcessor so that such
Ignored events do not fail precondition checks during event processing.

Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/test/resources/hive-site.xml.py
2 files changed, 10 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/13254/3
--
To view, visit http://gerrit.cloudera.org:8080/13254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6
Gerrit-Change-Number: 13254
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8369 : Fix for tests failing with incompatible column changes

2019-05-07 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13254 )

Change subject: IMPALA-8369 : Fix for tests failing with incompatible column 
changes
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13254/2/fe/src/test/resources/hive-site.xml.py
File fe/src/test/resources/hive-site.xml.py:

http://gerrit.cloudera.org:8080/#/c/13254/2/fe/src/test/resources/hive-site.xml.py@85
PS2, Line 85:
> flake8: E501 line too long (92 > 90 characters)
Done



--
To view, visit http://gerrit.cloudera.org:8080/13254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I488121f21d9b35d33dd003b2670bc0bbe1fee4b6
Gerrit-Change-Number: 13254
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 07 May 2019 18:06:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 [Test fixes] More test fixes when running against Hive-3

2019-05-07 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13276


Change subject: IMPALA-8369 [Test fixes] More test fixes when running against 
Hive-3
..

IMPALA-8369 [Test fixes] More test fixes when running against Hive-3

This test fixes CatalogTest, FrontendTest, CatalogObjectTofromThriftTest
by breaking some tests into the ones which are not expected to work on
Hive-3 and then skipping it. It does this by adding a util method in
TestUtils which returns if the environment variable
IMPALA_HIVE_MAJOR_VERSION is >= 3. If this condition is true, it skips
certain tests which use hive_idx_tbl (not supported in data-load against
Hive-3). If it is less than 3 the tests are not skipped so we keep the
test coverage on Hive-2 setups.

Also, fixes the TestCaseLoaderTest which instantiates a embedded HMS
instance. This requires some configuration changes for the embedded
standalone mode as well as adding datanucleus JDO as a test dependency.
Additionally, this patch also fixes test_show_create_table which was
failing on Hive-3 setups due to the additional parameter
bucketing_version available from Hive-3.

Testing Done:
1. Ran the tests when mini-cluster is deployed with USE_CDP_HIVE=true
and made sure that the tests work (or are skipped as expected)
2. Ran the same tests with USE_CDP_HIVE=false to make sure they still
work against HMS-2

Change-Id: If05f74efc481e2b0d26a9c4f6e58cef38605d72c
---
M fe/pom.xml
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/service/FrontendTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M tests/metadata/test_show_create_table.py
7 files changed, 69 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/13276/1
--
To view, visit http://gerrit.cloudera.org:8080/13276
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If05f74efc481e2b0d26a9c4f6e58cef38605d72c
Gerrit-Change-Number: 13276
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8419 : Fetch metastore configuration values to detect misconfigured setups

2019-04-18 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13019 )

Change subject: IMPALA-8419 : Fetch metastore configuration values to   
detect misconfigured setups
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@222
PS4, Line 222: FIRE_EVENTS_FOR_DML
We probably also need to make sure that the parameters filter config does not 
filter out keys like impala.events.catalogVersion, 
impala.events.catalogServiceId and impala.disableHmsSync


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@315
PS4, Line 315: try {
> Do you mean to use try-with-resource in the above method which calls using
I see. May be you can then create another method called 
getMetastoreConfig(String key, String defaultval) here. This method can 
implement it using try-with-resource as I suggested above. In the test, you can 
then use Mockito.spy to return the dummy values when this method is called. 
Something like

MetastoreStoreEventsProcessor spyEventsProcessor = 
Mockito.spy(eventsProcessor_);
doReturn("testValue").when(spyProcessor.getMetastoreConfig());


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@123
PS4, Line 123:   public static String getMetastoreConfigValue(
> I feel this is useful too when users can just compare the return value with
A more common pattern (and generic too) is to let the consumers decide what 
should be the default value if the configuration is not present. For example, 
for a boolean configuration users can choose to do getConfig(key, "false) 
whereas for a String configuration they can do getConfig(key, ""); If you avoid 
hard-coding the default value assumption it make it more usable for all the 
consumers.


http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/13019/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@245
PS4, Line 245: toggleBooleanValueString
code assumes that config values will always be booleans. Its probably easier to 
just pass some dummy. Also since there are only a few configurations, it 
probably easier to just have multiple statements of

when(getConfig(key1)).thenReturn(val1);
when(getConfig(key1)).thenReturn(val1);
when(getConfig(key1)).thenReturn(val1);

and get rid of the loop this way there is no need to have the cleanup logic 
too. All this can be done by creating a simple util method which takes in 
validateConfig(key, mockValue, shouldSucceed). You can test both the positive 
and negative cases using this util method.



--
To view, visit http://gerrit.cloudera.org:8080/13019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606
Gerrit-Change-Number: 13019
Gerrit-PatchSet: 4
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 18 Apr 2019 21:48:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
hive source component already exists and it downloads again unnecessarily.
This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 16 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/6
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
hive source component already exists and it downloads again unnecessarily.
This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 16 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/5
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@121
PS4, Line 121: component package directory"
> You can do something like this
Ah, I see. Done. Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 03 May 2019 20:58:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671

2019-05-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/13213 )

Change subject: Bump CDP_BUILD_NUMBER to 1056671
..

Bump CDP_BUILD_NUMBER to 1056671

This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Notes:
1. The tarball names and the directory to which they extract to changed
from the previous CDP_BUILD_NUMBER. Due to this we need to change the
bootstrap_toolchain and impala-config.sh so that the Hive environment
variables are set correctly.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 7 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/6
--
To view, visit http://gerrit.cloudera.org:8080/13213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Gerrit-Change-Number: 13213
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
components already contains a extracted tar ball of hive source. Due to
this every time the script executes it downloads the hive source tar
again unnecessarily. This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 10 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/2
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync

2019-05-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13199 )

Change subject: IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata 
sync
..


Patch Set 3: Code-Review+1

Thanks for the changes. Looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/13199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1016c27d3f12cef71a09a895ab42fd15a54aeee1
Gerrit-Change-Number: 13199
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 02 May 2019 19:29:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13219


Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
components already contains a extracted tar ball of hive source. Due to
this every time the script is executes it downloads the hive source tar
again unnecessarily. This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 10 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/1
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
components already contains a extracted tar ball of hive source. Due to
this every time the script executes it downloads the hive source tar
again unnecessarily. This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 11 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/3
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] Hive 3: switch to Tez-on-YARN execution

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13224 )

Change subject: Hive 3: switch to Tez-on-YARN execution
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13224/1/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/13224/1/bin/bootstrap_toolchain.py@571
PS1, Line 571: use_cdp_hive
looks like we can get rid of this line since it is already initialized above.


http://gerrit.cloudera.org:8080/#/c/13224/1/testdata/cluster/admin
File testdata/cluster/admin:

http://gerrit.cloudera.org:8080/#/c/13224/1/testdata/cluster/admin@311
PS1, Line 311: for
would be great if you could add a comment here explain why this is being done.



--
To view, visit http://gerrit.cloudera.org:8080/13224
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d
Gerrit-Change-Number: 13224
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yongzhi Chen 
Gerrit-Comment-Date: Fri, 03 May 2019 18:01:15 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..

Fix redundant downloads of hive source tarball

Since CDP_BUILD_NUMBER was bumped to 1056671 the name of the hive source
tarball changed. Not only the tar ball name was changed, the file it
gets extracted to is also different from the tar file itself. Due to
this the bootstrap_toolchain.py fails to check if the downloaded
components already contains a extracted tar ball of hive source. Due to
this every time the script executes it downloads the hive source tar
again unnecessarily. This patch improves bootstrap_toolchain.py to take
non-standard tarfiles which extracts to a different directory name
compared to the tar file.

Testing done:
1. Removed the local toolchain and ran the script couple of times to
make sure that it downloads the hive tar ball only once.

Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
---
M bin/bootstrap_toolchain.py
1 file changed, 18 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/13219/4
--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] Hive 3: switch to Tez-on-YARN execution

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13224 )

Change subject: Hive 3: switch to Tez-on-YARN execution
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/13224
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If96064f271582b2790a3cfb3d135f3834d46c41d
Gerrit-Change-Number: 13224
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yongzhi Chen 
Gerrit-Comment-Date: Fri, 03 May 2019 18:29:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata sync

2019-05-01 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13199 )

Change subject: IMPALA-8448: [DOCS] ALTER DATABASE event supported in metadata 
sync
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml
File docs/topics/impala_metadata.xml:

http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml@169
PS2, Line 169: 
 :   Change the description of the database
 : 
 :
 : 
 :   Change the comment on the database
 : 
comment and description are one and the same. I think you are missing changing 
the owner of the database here


http://gerrit.cloudera.org:8080/#/c/13199/2/docs/topics/impala_metadata.xml@178
PS2, Line 178: Change
I think it is worth mentioning here that changing the default location of the 
database does not move the tables of that database to the new location. Only 
the new tables which are created subsequently use the default location of the 
database in case it is not provided in the create table statement.



--
To view, visit http://gerrit.cloudera.org:8080/13199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1016c27d3f12cef71a09a895ab42fd15a54aeee1
Gerrit-Change-Number: 13199
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 02 May 2019 01:52:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671

2019-05-01 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/13213 )

Change subject: Bump CDP_BUILD_NUMBER to 1056671
..

Bump CDP_BUILD_NUMBER to 1056671

This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Notes:
1. The tarball names and the directory to which they extract to changed
from the previous CDP_BUILD_NUMBER. Due to this we need to change the
bootstrap_toolchain and impala-config.sh so that the Hive environment
variables are set correctly.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 7 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/4
--
To view, visit http://gerrit.cloudera.org:8080/13213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Gerrit-Change-Number: 13213
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1056671

2019-05-01 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13213 )

Change subject: Bump CDP_BUILD_NUMBER to 1056671
..

Bump CDP_BUILD_NUMBER to 1056671

This change bumps the CDP_BUILD_NUMBER to 1056671 which includes all the
Hive and Tez patches required for building against Hive 3. With this
change we get rid of the custom builds for Hive and Tez introduced in
IMPALA-8369 and switch to more official sources of builds for the
minicluster.

Testing Done:
1. Built against Hive-3 and Hive-2 using the flag USE_CDP_HIVE
2. Did basic testing from Impala and Beeline for the testing the tez
patch
3. Currently running the full-suite of tests to make sure there are no
regressions

Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
---
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
2 files changed, 7 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/13213/3
--
To view, visit http://gerrit.cloudera.org:8080/13213
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic758a15b33e89b6804c12356aac8e3f230e07ae0
Gerrit-Change-Number: 13213
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#10). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. This will be taken up
in subsequent change.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586
3. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
4. The Sentry object ownership tests are flaky. There are
race-conditions in that code which get exposed for some by this patch.
For example, when a database is created, the object privileges are
immediately updated in the catalog cache. If Sentry has not been updated
and there is a refresh authorization call in between it can clear the
privilege from the cache causing the subsequent statement using that
privilege to fail. The patch adds sleep statements in the test to
work-around these issues.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A 

[Impala-ASF-CR] Fix redundant downloads of hive source tarball

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13219 )

Change subject: Fix redundant downloads of hive source tarball
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13219/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13219/4//COMMIT_MSG@13
PS4, Line 13: Due to
: this
> nit: repeated use of "due to this", we can just use "and"
Done


http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@121
PS4, Line 121: component package directory"
> nit: fix indentation
Not sure how to fix this without adding more spaces in the exception message.


http://gerrit.cloudera.org:8080/#/c/13219/4/bin/bootstrap_toolchain.py@122
PS4, Line 122: if pkg_directory is not None:
 :   self.pkg_directory = "{0}/{1}".format(cdp_components_home, 
pkg_directory)
 : else:
 :   self.pkg_directory = "{0}/{1}".format(cdp_components_home, 
basename)
> It's more Pythonic to do this:
Done



--
To view, visit http://gerrit.cloudera.org:8080/13219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd04a1a367a0cc4aa0a2b490a45fbc93a862c83a
Gerrit-Change-Number: 13219
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 03 May 2019 20:37:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8438: Store WriteId and valid writeId list for table and partitio

2019-05-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13215 )

Change subject: IMPALA-8438: Store WriteId and valid writeId list for table and 
partitio
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@7
PS3, Line 7: partitio
nit, spell-check


http://gerrit.cloudera.org:8080/#/c/13215/3//COMMIT_MSG@10
PS3, Line 10: fucntions
nit, spell-check


http://gerrit.cloudera.org:8080/#/c/13215/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/13215/3/common/thrift/CatalogObjects.thrift@483
PS3, Line 483: 4: optional string valid_write_ids
Instead of storing this in string format, do you think it would be better to 
parse the string and store these as individual fields? If we do that, can we 
get rid of storing the table_name since it is already available in THdfsTable?


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@237
PS3, Line 237: getHighestWriteId
Do you think its better to throw UnsupportedOperationException in these methods?


http://gerrit.cloudera.org:8080/#/c/13215/3/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@258
PS3, Line 258:   public static long getMajorVersion() {
> can we rename this to getMajorVersion or something since it's not the full
+1



--
To view, visit http://gerrit.cloudera.org:8080/13215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6edbd64424edf0ba88af110ab8b958a1966b8b54
Gerrit-Change-Number: 13215
Gerrit-PatchSet: 5
Gerrit-Owner: Yongzhi Chen 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yongzhi Chen 
Gerrit-Comment-Date: Fri, 03 May 2019 20:43:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369: Fixing some core tests in Hive environment

2019-05-08 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13283 )

Change subject: IMPALA-8369: Fixing some core tests in Hive environment
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/impala_test_suite.py@738
PS3, Line 738: get_partition_names
is this a change of behavior in hive client in version 3? If yes, is it 
breaking backwards compatibility?


http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/skip.py
File tests/common/skip.py:

http://gerrit.cloudera.org:8080/#/c/13283/3/tests/common/skip.py@216
PS3, Line 216: ==
should this be >= 3



--
To view, visit http://gerrit.cloudera.org:8080/13283
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45d9b9312c6c77f436ab020ae68c15f3c7c737de
Gerrit-Change-Number: 13283
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 08 May 2019 19:25:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

2019-07-09 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
..

IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

The FileMetadataLoader is used to load the file information in when the
table is loaded. By default, it lists all the files in the
table/partition directory. Currently, it only skips the filenames which
are invalid (hidden files and ones starting with "_" etc). However, it
does not skip the directories which are temporary or hidden. In case of
Hive when data is inserted into a table, it creates a temporary staging
directory which is a hidden directory under the table location. When the
insert in hive is completed, such staging directories are removed. But
if there is a refresh called during that time, FileMetadataLoader will
add the files in the staging directory as well. Not only this could
cause temporary invalid results but it causes table to go in a bad state
when these temporary directories are removed. The only work-around in
such a case to issue a refresh on the table again.

This patch adds logic in the filemetadataloader to ignore such temporary
staging directories. Unfortunately, hadoop does not provide a API which
can recursively list files in a directory and skip certain directories.
This patch adds a new FilterIterator which wraps around existing
listFiles, listStatus and RecursingIterator to skip the hidden
directories from the listing result.

Also, the existing code to recover partitions implements its own
recursion logic which includes path validation. This already skips such
hidden directories since they do not conform to the partition spec. The
patch does a minor modification to this method by directly calling the
listStatusIterator instead of going through FileSystemUtil#listStatus
whiche uses the filtering remote iterator now.

Testing:
1. Added a new tests as well as modified existing ones which were
related to cover interesting cases.
2. Ran concurrent inserts from Hive while issuing refresh in a loop on
Impala side. Earlier this would cause the table to go into a bad state.
Now, it works fine for the staging directories. It still runs into a
FileNotFoundException from the impalad when there are insert overwrite
statements in Hive

Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Reviewed-on: http://gerrit.cloudera.org:8080/13665
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M tests/metadata/test_recursive_listing.py
6 files changed, 236 insertions(+), 9 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 14
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

2019-07-09 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
..


Patch Set 13: Code-Review+2

Carrying forward Todd's +2 from earlier


--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 13
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 09 Jul 2019 15:31:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] Bump CDP BUILD NUMBER to 1235229

2019-07-09 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13799 )

Change subject: Bump CDP_BUILD_NUMBER to 1235229
..

Bump CDP_BUILD_NUMBER to 1235229

The newer CDP build includes fix for HIVE-21932 which is required to fix
test failures on CDP builds.

Also, this patch fixes two test failures on CDP.

1 Fixes the MetastoreEventsProcessorTest on CDP caused due to the
serialization difference of the event messages between CDH and CDP
builds.
2. Fixes the Ranger audit test failure caused due to the missing cluster
name paramter from the Audit events from Ranger. This is needed a change
in the ranger-hive-audit.xml due to RANGER-2458

Testing:
1. Build against CDP and CDH builds
2. Run the previously failing tests on CDP which were caused by
HIVE-21932
3. Run full tests with CDP and CDH builds

Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
---
M bin/impala-config.sh
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/test/java/org/apache/impala/authorization/AuthorizationTestBase.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/resources/ranger-hive-audit.xml
6 files changed, 37 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/3
--
To view, visit http://gerrit.cloudera.org:8080/13799
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
Gerrit-Change-Number: 13799
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL

2019-07-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13777 )

Change subject: IMPALA-8519: [DOCS] Doc the limitation in insert events from 
SparkSQL
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/13777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36cfc7e592ed2588a8c1f8375033d60492b27a4f
Gerrit-Change-Number: 13777
Gerrit-PatchSet: 6
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 03 Jul 2019 18:39:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8519: [DOCS] Doc the limitation in insert events from SparkSQL

2019-07-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13777 )

Change subject: IMPALA-8519: [DOCS] Doc the limitation in insert events from 
SparkSQL
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml
File docs/topics/impala_metadata.xml:

http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@137
PS5, Line 137: table
In this case, it actually refreshes the partition not the table


http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@216
PS5, Line 216: operation
Using APIs instead of operation is better in my opinion.


http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@218
PS5, Line 218: /tmp/spark_unpart
I think it would be easier to understand if the example provides a realistic 
table location like /user/hive/warehouse/spark_etl.db/customers/date=01012019


http://gerrit.cloudera.org:8080/#/c/13777/5/docs/topics/impala_metadata.xml@258
PS5, Line 258: hive.metastore.dml.events
We should modify this text to say that this configuration should also be set to 
true in the hive-site.xml used by the spark applications (typically 
/etc/hive/conf/hive-site.xml) so that INSERT events are generated when the 
spark application inserts data into existing tables and partitions.



--
To view, visit http://gerrit.cloudera.org:8080/13777
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36cfc7e592ed2588a8c1f8375033d60492b27a4f
Gerrit-Change-Number: 13777
Gerrit-PatchSet: 5
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 03 Jul 2019 17:36:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] Bump CDP BUILD NUMBER to 1235229

2019-07-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/13799 )

Change subject: [WIP] Bump CDP_BUILD_NUMBER to 1235229
..

[WIP] Bump CDP_BUILD_NUMBER to 1235229

The newer CDP build includes fix for HIVE-21932 which is required to fix
test failures on CDP builds.

Also, fixed the MetastoreEventsProcessorTest on CDP caused due to the
serialization difference of the event messages between CDH and CDP
builds.

[WIP] Testing:
1. Build against CDP and CDH builds
2. Run the previously failing tests on CDP which were caused by
HIVE-21932
3. Run full tests with CDP and CDH builds

Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
---
M bin/impala-config.sh
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
4 files changed, 34 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/2
--
To view, visit http://gerrit.cloudera.org:8080/13799
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
Gerrit-Change-Number: 13799
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

2019-07-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

http://gerrit.cloudera.org:8080/#/c/13665/8/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java@545
PS8, Line 545: all underlying files (except which are
 :* in the ignored directories)
> hrm, were the semantics already broken then? See the unit test AcidUtilsTes
yeah, it looks like its broken currently for this case. I will file a JIRA. The 
AcidUtils test works because it works off the List not the ones 
which are actually returned by the FileMetadataLoader.

Specifically, if the table is transactional we cannot rely on using listFiles 
API since it will skip empty directories (Or we should implement a version of 
it for own purposes). Not sure how to handle the S3 recursive listFiles case 
though.

Created https://issues.apache.org/jira/browse/IMPALA-8739 as suggested



--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 03 Jul 2019 20:45:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [WIP] Bump CDP BUILD NUMBER to 1235229

2019-07-03 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13799


Change subject: [WIP] Bump CDP_BUILD_NUMBER to 1235229
..

[WIP] Bump CDP_BUILD_NUMBER to 1235229

The newer CDP build includes fix for HIVE-21932 which is required to fix
test failures on CDP builds.

Also, fixed the MetastoreEventsProcessorTest on CDP caused due to the
serialization difference of the event messages between CDH and CDP
builds.

[WIP] Testing:
1. Build against CDP and CDH builds
2. Run the previously failing tests on CDP which were caused by
HIVE-21932
3. Run full tests with CDP and CDH builds

Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
---
M bin/impala-config.sh
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
4 files changed, 33 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/13799/1
--
To view, visit http://gerrit.cloudera.org:8080/13799
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6d0fa856fa6c7cb1f4c9ee0e4250cfa1e8a14595
Gerrit-Change-Number: 13799
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8734: Reload table schema on TBLPROPERTIES change

2019-07-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13785 )

Change subject: IMPALA-8734: Reload table schema on TBLPROPERTIES change
..


Patch Set 4: Code-Review+2

patch looks good to me.


--
To view, visit http://gerrit.cloudera.org:8080/13785
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a43a962c2a456f3ddc078b2924f551fccb5c2ad
Gerrit-Change-Number: 13785
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 02 Jul 2019 20:11:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

2019-07-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/13665 )

Change subject: IMPALA-8663 : FileMetadataLoader should skip hidden and tmp 
directories
..

IMPALA-8663 : FileMetadataLoader should skip hidden and tmp directories

The FileMetadataLoader is used to load the file information in when the
table is loaded. By default, it lists all the files in the
table/partition directory. Currently, it only skips the filenames which
are invalid (hidden files and ones starting with "_" etc). However, it
does not skip the directories which are temporary or hidden. In case of
Hive when data is inserted into a table, it creates a temporary staging
directory which is a hidden directory under the table location. When the
insert in hive is completed, such staging directories are removed. But
if there is a refresh called during that time, FileMetadataLoader will
add the files in the staging directory as well. Not only this could
cause temporary invalid results but it causes table to go in a bad state
when these temporary directories are removed. The only work-around in
such a case to issue a refresh on the table again.

This patch adds logic in the filemetadataloader to ignore such temporary
staging directories. Unfortunately, hadoop does not provide a API which
can recursively list files in a directory and skip certain directories.
This patch adds a new FilterIterator which wraps around existing
listFiles, listStatus and RecursingIterator to skip the hidden
directories from the listing result.

Also, the existing code to recover partitions implements its own
recursion logic which includes path validation. This already skips such
hidden directories since they do not conform to the partition spec. The
patch does a minor modification to this method by directly calling the
listStatusIterator instead of going through FileSystemUtil#listStatus
whiche uses the filtering remote iterator now.

Testing:
1. Added a new tests as well as modified existing ones which were
related to cover interesting cases.
2. Ran concurrent inserts from Hive while issuing refresh in a loop on
Impala side. Earlier this would cause the table to go into a bad state.
Now, it works fine for the staging directories. It still runs into a
FileNotFoundException from the impalad when there are insert overwrite
statements in Hive

Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
---
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
A fe/src/test/java/org/apache/impala/common/FileSystemUtilTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M tests/metadata/test_recursive_listing.py
6 files changed, 236 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/65/13665/12
--
To view, visit http://gerrit.cloudera.org:8080/13665
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2c4a22908304fe9e377d77d6c18d401c3f3294aa
Gerrit-Change-Number: 13665
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-7973: Add support for fine grained updates at partition level.

2019-04-24 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13111 )

Change subject: IMPALA-7973: Add support for fine grained updates at partition 
level.
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1106
PS2, Line 1106: addedPartitions_
Would be good if you can log a useful information about the event like number 
of partitions, isReplace, tablename etc.


http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1107
PS2, Line 1107: Map partSpec = new HashMap<>();
  :   
List fsList =
  :   msTbl_.getPartitionKeys();
  :   List partVals = partition.getValues();
  :   Preconditions.checkNotNull(partVals);
  :   Preconditions.checkState(fsList.size() == 
partVals.size());
  :   for (int i = 0; i < fsList.size(); i++) {
  : partSpec.put(fsList.get(i).getName(), 
partVals.get(i));
  :   }
We can avoid one unnecessary conversion from Partition -> partSpecMap -> 
TPartitionKeyValue if we change the signature of the refreshPartitionIfExists 
to take List here.


http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1194
PS2, Line 1194:   Preconditions.checkNotNull(partitionAfter_);
  :   Map partSpec = new HashMap<>();
  :   List 
fsList =
  :   msTbl_.getPartitionKeys();
  :   List partVals = partitionAfter_.getValues();
  :   Preconditions.checkNotNull(partVals);
  :   Preconditions.checkState(fsList.size() == 
partVals.size());
  :   for (int i = 0; i < fsList.size(); i++) {
  : partSpec.put(fsList.get(i).getName(), partVals.get(i));
  :   }
same suggestion as above. Also, instead of duplicating this logic, create a 
util method in the base class (or a static method)


http://gerrit.cloudera.org:8080/#/c/13111/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1266
PS2, Line 1266: droppedPartitions_
add a null check as well



--
To view, visit http://gerrit.cloudera.org:8080/13111
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I213401329f3965dd81055197792ccf8a05368af5
Gerrit-Change-Number: 13111
Gerrit-PatchSet: 2
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 25 Apr 2019 02:14:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/authorization/test_owner_privileges.py
M tests/common/sentry_cache_test_suite.py
M tests/custom_cluster/test_permanent_udfs.py
34 files changed, 1,208 insertions(+), 294 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/7
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan 

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. This will be taken up
in subsequent change.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586
3. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
4. The Sentry object ownership tests are flaky. There are
race-conditions in that code which get exposed for some by this patch.
For example, when a database is created, the object privileges are
immediately updated in the catalog cache. If Sentry has not been updated
and there is a refresh authorization call in between it can clear the
privilege from the cache causing the subsequent statement using that
privilege to fail. The patch adds sleep statements in the test to
work-around these issues.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A 

[Impala-ASF-CR] IMPALA-8419 : Validate event processing related configurations

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13019 )

Change subject: IMPALA-8419 : Validate event processing related configurations
..


Patch Set 7:

(12 comments)

Overall looks good. Few comments below and then good to go from my side.

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java
File 
fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java:

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@42
PS7, Line 42: EventProcessorValidation
a better name could be EventProcessorConfigValidator


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@42
PS7, Line 42: interface
general convention in Impala is not to use package-private classes (although I 
think its a good idea to have them) For consistency, please change these to 
public. Same with the members of this class. Try to keep the members private 
unless its not possible.


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@59
PS7, Line 59: getExpectedValue
method doc


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@74
PS7, Line 74: String
private static final


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@115
PS7, Line 115: MetastoreEventParameters
nit, formatting is off


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@120
PS7, Line 120: filtered
s/filtered/filtered out/


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/EventProcessorValidation.java@157
PS7, Line 157: Found
would be good to provided the expected value here as well


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@68
PS7, Line 68: MetastoreEventParameters
using parameters and properties interchangebly is confusing. may be rename this 
to MetastoreEventPropertyKey


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@70
PS7, Line 70: CATALOG_VERSION_PROP_KEY
_PROP_KEY is redundant if you name the enum as MetastoreEventPropertyKey. SAme 
for others.


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
File 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java@253
PS7, Line 253: Runtime error..
redundant to have this text here. can be removed.


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@88
PS7, Line 88: DEFAULT_METASTORE_CONFIG_VALUE
I am not convinced of the value of having this. Why can't we just use empty 
string which could be private to the config validator?


http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/13019/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@336
PS7, Line 336: impala
add a test case for ^impala as well since it matches the disableHmsSync paramter



--
To view, visit http://gerrit.cloudera.org:8080/13019
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94c2783e36287a65122003aa55d8075a806bc606
Gerrit-Change-Number: 13019
Gerrit-PatchSet: 7
Gerrit-Owner: Bharath Krishna 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 25 Apr 2019 20:23:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 8:

> (3 comments)
 >
 > Before more work goes into this, this is changing the Hive major
 > version number and there should be a discussion on dev@ about this
 > approach and how it fits into Impala releases.

Just sent a note the dev list. Thanks for the suggestion.


--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 25 Apr 2019 22:02:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/dependency-reduced-pom.xml
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/authorization/test_owner_privileges.py
M tests/common/sentry_cache_test_suite.py
M tests/custom_cluster/test_permanent_udfs.py
35 files changed, 1,275 insertions(+), 292 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/6
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang 

[Impala-ASF-CR] Configure Hive 3's HS2 to execute queries using Tez local mode

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12931 )

Change subject: Configure Hive 3's HS2 to execute queries using Tez local mode
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12931
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I76e47fbd1d6ff5103d81a8de430d5465dba284cd
Gerrit-Change-Number: 12931
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 25 Apr 2019 17:34:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-23 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A shaded-deps/CMakeLists.txt
A shaded-deps/dependency-reduced-pom.xml
A shaded-deps/pom.xml
M testdata/bin/run-hive-server.sh
M tests/custom_cluster/test_permanent_udfs.py
33 files changed, 1,224 insertions(+), 258 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/05/13005/5
--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] [acid] Disallow any operation on full acid table.

2019-04-23 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13047 )

Change subject: [acid] Disallow any operation on full acid table.
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
File fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java:

http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@80
PS2, Line 80: transactional
Can we change these strings to constants?
transactional, transactional_properties, insert_only


http://gerrit.cloudera.org:8080/#/c/13047/2/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java@84
PS2, Line 84: tableIsTransactional != null && 
tableIsTransactional.equalsIgnoreCase("true")
nit, probably cleaner if we replace this by 
Boolean.parseBoolean(tableIsTransactional)



--
To view, visit http://gerrit.cloudera.org:8080/13047
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb92e5b691bf192980f2cc7d68c491bb1e8455ac
Gerrit-Change-Number: 13047
Gerrit-PatchSet: 2
Gerrit-Owner: Sudhanshu Arora 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 23 Apr 2019 19:42:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-25 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..

IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

This change upgrades the hive dependencies of Impala to use Hive 3.1.0
based binaries. Most of the changes in this patch are based off patches
provided by Todd (links available in JIRA).

Upgrading the dependencies allows us to work with both Hive 3.1.0 and
Hive 2.1.0 in the same code line. In order to do this, the patch
trims down a lot of unnecessary hive dependencies of the front end code
by creating a shaded-deps module. The pom.xml of shaded-deps includes
only the files from Hive source which Impala depends for compilation.

Additionally, it also uses a custom build of Hive which is based of
Hive 3.1.0. This custom build includes patches for HIVE-21596 and
HIVE-21586 which are needed by Impala so that it can compile against
Hive-3 libraries and be able to talk to HMS-2.x metastore. Once these
patches are merged we can get rid of this custom build and rely on more
official sources of Hive builds.

The patch also changes impala-config.sh so that it always downloads the
Hive-3 libraries from the toolchain. The code is always built using
Hive-3 jars. However, based on the value of USE_CDP_HIVE, the
minicluster is deployed using Hive-3 or Hive-2 binaries. Since Impala
implements HiveServer2's TCLIService.thrift interface, it requires us to
use the existing mechanism of copying the hive-2/api TCLIService.thrift
to
hive-3/api. It also adds a few environment variables which point to the
metastore's thrift file and the CDH Hive version.

Testing:
1. Code compiles and runs against both HMS-3 and HMS-2
2. Ran full-suite of tests using the private jenkins job against HMS-2
3. Running full-tests against HMS-3 will need more work like supporting
Tez in the mini-cluster (for dataloading) and HMS transaction support
since HMS3 create transactional tables by default. This will be taken up
in subsequent change.

Notes:
1. Patch uses a custom build of Hive to be deployed in mini-cluster. This
build has the fixes for HIVE-21596
2. The Hive 3.1 does not include some of the patches like
HIVE-21077. Some of the recent code in Impala depends on that patch to
exist in hive. We will need to port HIVE-21077 to Hive 3.1.0 in order to
compile against Hive 3.1.0 jars. I will continue working on getting
this merged along with HIVE-21586
3. Some of the existing tests rely on the fact the UDFs implement the
UDF interface in Hive (UDFLength, UDFHour, UDFYear). These built-in hive
functions have been moved to use GenericUDF interface in Hive 3. Impala
currently only supports UDFExecutor. In order to have a full
compatibility with all the functions in Hive 2.x we should support
GenericUDFs too. That would be taken up as a separate patch.
4. The Sentry object ownership tests are flaky. There are
race-conditions in that code which get exposed for some by this patch.
For example, when a database is created, the object privileges are
immediately updated in the catalog cache. If Sentry has not been updated
and there is a refresh authorization call in between it can clear the
privilege from the cache causing the subsequent statement using that
privilege to fail. The patch adds sleep statements in the test to
work-around these issues.

Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
---
M CMakeLists.txt
M README.md
M bin/bootstrap_toolchain.py
M bin/impala-config.sh
M bin/set-classpath.sh
M common/thrift/.gitignore
M common/thrift/CMakeLists.txt
M fe/CMakeLists.txt
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/StringLiteral.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/MetaStoreClientPool.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/hive/executor/UdfExecutor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/DescribeResultFactory.java
M fe/src/main/java/org/apache/impala/service/MetadataOp.java
A fe/src/main/java/org/apache/impala/util/MetadataFormatUtils.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/hive/executor/UdfExecutorTest.java
M fe/src/test/java/org/apache/impala/testutil/EmbeddedMetastoreClientPool.java
M impala-parent/pom.xml
A shaded-deps/.gitignore
A 

[Impala-ASF-CR] IMPALA-6663 Expose current DDL metrics on WebUI

2019-08-13 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13806 )

Change subject: IMPALA-6663 Expose current DDL metrics on WebUI
..


Patch Set 7:

(2 comments)

> (4 comments)
 >
 > I think this in general is a very useful addition. Can you also
 > specify the motivation of why we need to track the total number of
 > DDLs instead of tracking them on a per table level. Certain
 > metadata operations like invalidate metadata as indeed global and
 > cannot be tracked at table. However, I think it useful to track how
 > many refresh or invalidates were called on a given table and
 > finding top-n such tables. Similarly for DMLs, instead of a getting
 > a total count of INSERTS, would it be more useful to get top-N
 > tables which are inserted into?

Took a second pass and realized that you are indeed tracking currently running 
DDL ops. The question in such a case is how do we determine if seeing 3 
resetMetadata operations on 3 small tables is better or worse than 1 
resetMetadata operation on a large table. Can you please comment on how should 
the end user interpret these values to determine the load on the catalog?

http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java
File 
fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java:

http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@70
PS7, Line 70: decrementDdlTypeCounter
> Is decrement ever needed? same for the other counters?
you may disregard this comment since I thought you were only tracking the 
global count of operations instead of running ones.


http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl
File www/catalog.tmpl:

http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl@202
PS7, Line 202: Running
> Is this user-visible text? Running seems to suggest these operations are cu
I think I misunderstood the patch a little bit. Looks like we are indeed 
tracking the running operations. You may discard this comment.



--
To view, visit http://gerrit.cloudera.org:8080/13806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a
Gerrit-Change-Number: 13806
Gerrit-PatchSet: 7
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 13 Aug 2019 17:32:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6663 Expose current DDL metrics on WebUI

2019-08-13 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13806 )

Change subject: IMPALA-6663 Expose current DDL metrics on WebUI
..


Patch Set 7:

(4 comments)

I think this in general is a very useful addition. Can you also specify the 
motivation of why we need to track the total number of DDLs instead of tracking 
them on a per table level. Certain metadata operations like invalidate metadata 
as indeed global and cannot be tracked at table. However, I think it useful to 
track how many refresh or invalidates were called on a given table and finding 
top-n such tables. Similarly for DMLs, instead of a getting a total count of 
INSERTS, would it be more useful to get top-N tables which are inserted into?

http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java
File 
fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java:

http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@41
PS7, Line 41:   private AtomicLong resetMetadataCounter_;
:
:   private AtomicLong dmlOperationCounter_;
:
:   private AtomicLong syncDdlOperationCounter_;
Can you add a one line comment on each of these counters as to what do they 
represent? Also, would be great if you could specify what operations constitute 
a DML in this context.


http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@70
PS7, Line 70: decrementDdlTypeCounter
Is decrement ever needed? same for the other counters?


http://gerrit.cloudera.org:8080/#/c/13806/7/fe/src/main/java/org/apache/impala/catalog/monitor/CatalogOpRequestCounter.java@109
PS7, Line 109: getResetMetadataCounter
By returning the AtomicLong object this class is leaking the private field and 
it is subject to modification by other classes. A more common pattern is to 
return its value (return type long).


http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl
File www/catalog.tmpl:

http://gerrit.cloudera.org:8080/#/c/13806/7/www/catalog.tmpl@202
PS7, Line 202: Running
Is this user-visible text? Running seems to suggest these operations are 
currently being run.



--
To view, visit http://gerrit.cloudera.org:8080/13806
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0ed76f134bad6d3b3d4dce132365a53a01e9512a
Gerrit-Change-Number: 13806
Gerrit-PatchSet: 7
Gerrit-Owner: Tamas Mate 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 13 Aug 2019 17:24:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14024 )

Change subject: IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd
Gerrit-Change-Number: 14024
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 12 Aug 2019 17:14:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8627: [WIP] Enable catalog-v2 in tests

2019-07-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13933 )

Change subject: IMPALA-8627: [WIP] Enable catalog-v2 in tests
..

IMPALA-8627: [WIP] Enable catalog-v2 in tests

This patch enables catalog-v2 by default in all the tests.

Testing is in progress.

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
---
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M tests/query_test/test_observability.py
4 files changed, 39 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/3
--
To view, visit http://gerrit.cloudera.org:8080/13933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Gerrit-Change-Number: 13933
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-07-30 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 8 concurrent Hive clients and each
of the client runs 100 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

The test already found a bug which is caused when a Hive issues a alter
table add if not exists partition" query and the partition is not really
added since it is preexisting. In such a case the ADD_PARTITION events
is still generated but with a empty list of partitions in the events.
Such events are now ignored.

Notes:
1. Ran the test with defaults. It generates about 2100 events
and runs for close to 15 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
8 files changed, 1,544 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/4
--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-07-30 Thread Vihang Karajgaonkar (Code Review)
Hello Bharath Vissapragada, Quanlong Huang, Anurag Mantripragada, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13932

to look at the new patch set (#5).

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 8 concurrent Hive clients and each
of the client runs 100 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

The test already found a bug which is caused when a Hive issues a alter
table add if not exists partition" query and the partition is not really
added since it is preexisting. In such a case the ADD_PARTITION events
is still generated but with a empty list of partitions in the events.
Such events are now ignored.

Notes:
1. Ran the test with defaults. It generates about 2100 events
and runs for close to 15 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
8 files changed, 1,578 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-07-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 8 concurrent Hive clients and each
of the client runs 100 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

The test already found a bug which is caused when a Hive issues a alter
table add if not exists partition" query and the partition is not really
added since it is preexisting. In such a case the ADD_PARTITION events
is still generated but with a empty list of partitions in the events.
Such events are now ignored.

Notes:
1. Ran the test with defaults. It generates about 2100 events
and runs for close to 15 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
A fe/src/test/java/org/apache/impala/catalog/events/RandomHiveQueryRunner.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
8 files changed, 1,516 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/3
--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8627: [WIP] Enable catalog-v2 in tests

2019-07-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/13933 )

Change subject: IMPALA-8627: [WIP] Enable catalog-v2 in tests
..

IMPALA-8627: [WIP] Enable catalog-v2 in tests

This patch enables catalog-v2 by default in all the tests.

Also, fixes the test_observability which fails on catalog-v2 since
the profile emits different metadata load events.

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
---
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M tests/query_test/test_observability.py
4 files changed, 41 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/4
--
To view, visit http://gerrit.cloudera.org:8080/13933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Gerrit-Change-Number: 13933
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8627: Enable catalog-v2 in tests

2019-07-29 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/13933 )

Change subject: IMPALA-8627: Enable catalog-v2 in tests
..

IMPALA-8627: Enable catalog-v2 in tests

This patch enables catalog-v2 by default in all the tests.

Also, fixes the test_observability which fails on catalog-v2 since
the profile emits different metadata load events.

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
---
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M tests/query_test/test_observability.py
4 files changed, 41 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/5
--
To view, visit http://gerrit.cloudera.org:8080/13933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Gerrit-Change-Number: 13933
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-8627: Enable catalog-v2 in tests

2019-08-02 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/13933 )

Change subject: IMPALA-8627: Enable catalog-v2 in tests
..

IMPALA-8627: Enable catalog-v2 in tests

This patch enables catalog-v2 by default in all the tests.

Test fixes:
1. Modified test_observability which fails on catalog-v2 since
the profile emits different metadata load events. The test now looks for
the right events on the profile depending on whether catalogv2 is
enabled or not.
2. Fixes a bug in TableName.java constructor which allows non-lowercased
table and database names. This causes problems at the local catalog
cache which expects the tablenames to be always in lowercase. More
details on this failure are available in IMPALA-8627.
3. Fixes the JdbcTest which checks for existence of table comment in the
getTables metadata jdbc call. In catalog-v2 since the columns are not
requested, LocalTable is not loaded and hence the test needs to be
modified to check if catalog-v2 is enabled.

Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
---
M docker/catalogd/Dockerfile
M docker/impalad_coord_exec/Dockerfile
M docker/impalad_coordinator/Dockerfile
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/test/java/org/apache/impala/service/JdbcTest.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/workloads/functional-query/queries/QueryTest/describe-db.test
M tests/common/environ.py
M tests/hs2/hs2_test_suite.py
M tests/hs2/test_hs2.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_refresh_partition.py
M tests/query_test/test_observability.py
14 files changed, 136 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/33/13933/9
--
To view, visit http://gerrit.cloudera.org:8080/13933
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iddbde666de2b780c0e40df716a9dfe54524e092d
Gerrit-Change-Number: 13933
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8841: Try to fix Tez related dataload flakiness

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14081 )

Change subject: IMPALA-8841: Try to fix Tez related dataload flakiness
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14081/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14081/2//COMMIT_MSG@10
PS2, Line 10: initializing Tez resources in parallel
my observation is that the tez resources are allocated per session and since 
every hive query which is run during dataload happens in a separate beeline 
session, it may still need to initialize the resources. Right?


http://gerrit.cloudera.org:8080/#/c/14081/2/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/14081/2/testdata/bin/create-load-data.sh@620
PS2, Line 620: before before
typo



--
To view, visit http://gerrit.cloudera.org:8080/14081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id21d57483fe7a4f72f450fb71f8f53b3c1ef6327
Gerrit-Change-Number: 14081
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Anonymous Coward (521)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 16 Aug 2019 16:35:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8793: Implement TRUNCATE for insert-only ACID tables

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14071 )

Change subject: IMPALA-8793: Implement TRUNCATE for insert-only ACID tables
..


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java
File fe/src/main/java/org/apache/impala/catalog/Transaction.java:

http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@17
PS2, Line 17: Transaction
Does this class need to be thread-safe? I can see there are possible races with 
respect to the transactionId_ being used as the flag to track if this 
transaction is open or close.


http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@20
PS2, Line 20: transactionId_
can we make this final? I see that you are using this to keep track of whether 
the transaction is open or close. May be you can use a AtomicBoolean to keep 
track of that separately so that commit() can be made thread-safe.


http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@38
PS2, Line 38: checkState(transactionId_ > 0)
Does this need to throw an exception? What is the behavior from HMS if a client 
calls commit on a already committed transaction?


http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/catalog/Transaction.java@46
PS2, Line 46: transactionId_
nit, can we change this to if (transactionId_ < 0) return; makes it easier to 
read in my opinion.


http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/14071/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1821
PS2, Line 1821: COLUMN_STATS_ACCURATE
Its unclear why we are clearing this property. Can you add a comment explaining 
the same?



--
To view, visit http://gerrit.cloudera.org:8080/14071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic749b7f27da157e1c0ebf9b7e9b6ee09afad122a
Gerrit-Change-Number: 14071
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Dinesh Garg (430)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Reviewer: Yongzhi Chen 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 16 Aug 2019 17:22:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8841: Try to fix Tez related dataload flakiness

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14081 )

Change subject: IMPALA-8841: Try to fix Tez related dataload flakiness
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14081
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id21d57483fe7a4f72f450fb71f8f53b3c1ef6327
Gerrit-Change-Number: 14081
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Anonymous Coward (521)
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 16 Aug 2019 18:45:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14082

to look at the new patch set (#2).

Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3
..

IMPALA-8847: Fix test_event_processing.py on Hive-3

The test_event_processing fails on Hive-3 due to a error in the syntax of the
hive query for transactional tables. It was missed unfortunately when
IMPALA-8847 was merged.

Testing done:
Ran the test on both Hive-2 and Hive-3 and it passes now

Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
---
M tests/custom_cluster/test_event_processing.py
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14082/2
--
To view, visit http://gerrit.cloudera.org:8080/14082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
Gerrit-Change-Number: 14082
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] Fix test event processing.py on Hive-3

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14082


Change subject: Fix test_event_processing.py on Hive-3
..

Fix test_event_processing.py on Hive-3

The test_event_processing fails on Hive-3 due to a error in the syntax of the
hive query for transactional tables. It was missed unfortunately when
IMPALA-8847 was merged

Testing done:
Ran the test on both Hive-2 and Hive-3 and it passes now

Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
---
M tests/custom_cluster/test_event_processing.py
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/14082/1
--
To view, visit http://gerrit.cloudera.org:8080/14082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
Gerrit-Change-Number: 14082
Gerrit-PatchSet: 1
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..

IMPALA-8847: Ignore add partition events with empty partition list

Certain Hive queries like "alter table  add if not exists
partition ()" generate a add_partition event even if the
partition did not really exists. Such events have a empty partition list
in the event message which trips on the Precondition check in the
AddPartitionEvent. This causes event processor to go into error state.
The only way to recover is to issue invalidate metadata in such a case.

The patch adds logic to ignore such events.

Testing:
1. Added a test case which reproduces the issue. The test case works
after the patch is applied.

Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Reviewed-on: http://gerrit.cloudera.org:8080/14049
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_event_processing.py
A tests/util/event_processor_utils.py
3 files changed, 162 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..


Patch Set 5: Code-Review+2

Carrying forward Anurag’s +1. I suspect the previous test failure is caused due 
to a race condition in the test itself. I will investigate it separately. This 
change is pretty simple and its scope is limited to event processor. I am going 
to go ahead and merge the change by self-reviewing in addition to Anurag’s +1


--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Aug 2019 07:00:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8862: Don't ship jetty and ant

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14068 )

Change subject: IMPALA-8862: Don't ship jetty and ant
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c50da957e3704c919b391ecd03c26ed851bdaf
Gerrit-Change-Number: 14068
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Aug 2019 16:48:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14043 )

Change subject: IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG@14
PS7, Line 14: This change is Step 2 in a series of steps to coordinate the
: introduction of HMS API changes to Hive3 and Impala. Step 4 is 
also
: Impala related, it will be covered in IMPALA-8842 part 2.
:
: The steps are as follows:
:
: 1. Change in Hive3.
: We push new APIs so Impala can use them. New APIs will simply 
call old
: existing methods so there should not be any change of 
functionality
: there. Since there were many incompatible changes, new APIs are 
tagged
: method_name_V2.
:
: 2. Change in Impala.
: Push changes to use new methods *V2.
:
: 3. Change in Hive3.
: Push patch with complete functionality. *V2 methods contains the 
new
: logic. The old existing methods are not used anymore by Impala at 
this
: point, hence they can be removed. For every method_name_V2, I will
: create a corresponding method method_name that calls the former 
one.
:
: 4. Change in Impala
: Replace *V2 calls by *.
:
: 5. Change in Hive3.
: Remove *V2 methods from API
I am not sure why we need this in the commit message. The commit message should 
give details what the patch does and why. The coordination of patches between 
Hive and Impala can be done out-of-band and provides no value by recording the 
steps in the git commit message.


http://gerrit.cloudera.org:8080/#/c/14043/7/shaded-deps/pom.xml
File shaded-deps/pom.xml:

http://gerrit.cloudera.org:8080/#/c/14043/7/shaded-deps/pom.xml@43
PS7, Line 43: 
: 
:   org.apache.hive.shims
:   hive-shims-0.20
: 
:   
Is there a particular reason why this needs to be excluded for this patch?



--
To view, visit http://gerrit.cloudera.org:8080/14043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a73f5eeac8e84d63b22aaed5dfbcd8ea39f0af4
Gerrit-Change-Number: 14043
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Aug 2019 17:20:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14043 )

Change subject: IMPALA-8842 part 1: (Hive3) Use 'engine' field in HMS stat API
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14043/7//COMMIT_MSG@14
PS7, Line 14: This change is Step 2 in a series of steps to coordinate the
: introduction of HMS API changes to Hive3 and Impala. Step 4 is 
also
: Impala related, it will be covered in IMPALA-8842 part 2.
:
: The steps are as follows:
:
: 1. Change in Hive3.
: We push new APIs so Impala can use them. New APIs will simply 
call old
: existing methods so there should not be any change of 
functionality
: there. Since there were many incompatible changes, new APIs are 
tagged
: method_name_V2.
:
: 2. Change in Impala.
: Push changes to use new methods *V2.
:
: 3. Change in Hive3.
: Push patch with complete functionality. *V2 methods contains the 
new
: logic. The old existing methods are not used anymore by Impala at 
this
: point, hence they can be removed. For every method_name_V2, I will
: create a corresponding method method_name that calls the former 
one.
:
: 4. Change in Impala
: Replace *V2 calls by *.
:
: 5. Change in Hive3.
: Remove *V2 methods from API
> I also wanted to explain why there's no change in functionality yet.
I am okay with this. This should not block the commit.



--
To view, visit http://gerrit.cloudera.org:8080/14043
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a73f5eeac8e84d63b22aaed5dfbcd8ea39f0af4
Gerrit-Change-Number: 14043
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 15 Aug 2019 18:24:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 4 concurrent Hive clients and each
of the client runs 50 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

Notes:
1. Ran the test with defaults. It generates about 500 events
and runs for close to 4.5 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
6 files changed, 1,590 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/8
--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-08-15 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 8 concurrent Hive clients and each
of the client runs 100 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

Notes:
1. Ran the test with defaults. It generates about 2100 events
and runs for close to 15 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
6 files changed, 1,590 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/7
--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 4 concurrent Hive clients and each
of the client runs 50 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

Notes:
1. Ran the test with defaults. It generates about 500 events
and runs for close to 4.5 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
---
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
6 files changed, 1,598 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/13932/9
--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..

IMPALA-8847: Ignore add partition events with empty partition list

Certain Hive queries like "alter table  add if not exists
partition ()" generate a add_partition event even if the
partition did not really exists. Such events have a empty partition list
in the event message which trips on the Precondition check in the
AddPartitionEvent. This causes event processor to go into error state.
The only way to recover is to issue invalidate metadata in such a case.

The patch adds logic to ignore such events.

Testing:
1. Added a test case which reproduces the issue. The test case works
after the patch is applied.

Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_event_processing.py
A tests/util/event_processor_utils.py
3 files changed, 162 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/4
--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py
File tests/custom_cluster/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@181
PS3, Line 181: def wait_for_insert_event_processing(self, previous_event_id):
> I like that you moved out EventProcessorUtils. However, this file heavily u
yes, I agree. I am planning to clean it up as part of
IMPALA-8795. This method had some custom logic in place where it was waiting 
for 2 events so didn't want to touch that to keep the scope minimum.


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@200
PS3, Line 200: get_event_processor_metrics
> Same as my above comment, this function can be removed here and modify this
I am planning to clean up all the event polling tests together in IMPALA-8795


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/custom_cluster/test_event_processing.py@215
PS3, Line 215: get_last_synced_event_id
> Same as previous comment.
I am planning to clean up all the event polling tests together in IMPALA-8795


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py
File tests/util/event_processor_utils.py:

http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@27
PS3, Line 27: class EventProcessorUtils(object):
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@30
PS3, Line 30: D
> flake8: E303 too many blank lines (2)
Done


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@50
PS3, Line 50: .
> flake8: E131 continuation line unaligned for hanging indent
Done


http://gerrit.cloudera.org:8080/#/c/14049/3/tests/util/event_processor_utils.py@56
PS3, Line 56: @
> flake8: E303 too many blank lines (2)
Done



--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 12 Aug 2019 21:55:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14024 )

Change subject: IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus
..

IMPALA-8766: Undo hadoop-cloud-storage + HWX Nexus

Previous commits for IMPALA-8766 attempted to use hadoop-cloud-storage
to satisfy Impala's cloud dependencies (e.g. hadoop-aws, hadoop-azure,
etc). On builds with USE_CDP_HIVE=true, this adds Knox
gateway-cloud-bindings. However, the entry for hadoop-cloud-storage
artifact in the impala.cdp.repo maven repository introduces
dependencies that are external to that repository. This requires the
HWX Nexus repository to resolve those dangling dependencies.
Unfortunately, HWX Nexus ages out old jars, including the ones we
need.

This stops using hadoop-cloud-storage, and instead adds a direct
dependency to Knox for USE_CDP_HIVE=true. It disables the HWX Nexus
repository and leaves a tombstone explaining why.

Testing:
 - Deleted my .m2 directory and rebuilt Impala with USE_CDP_HIVE=true
 - Verified the CLASSPATH still contains the right jars on USE_CDP_HIVE=true

Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd
Reviewed-on: http://gerrit.cloudera.org:8080/14024
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M bin/impala-config.sh
M fe/pom.xml
M impala-parent/pom.xml
3 files changed, 48 insertions(+), 49 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14024
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I79a0c2575fc50bbc3b393c150c0bce22258ea1bd
Gerrit-Change-Number: 14024
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14049/2/tests/custom_cluster/test_event_processing.py
File tests/custom_cluster/test_event_processing.py:

http://gerrit.cloudera.org:8080/#/c/14049/2/tests/custom_cluster/test_event_processing.py@156
PS2, Line 156:
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py
File tests/util/event_processor_utils.py:

http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@22
PS2, Line 22: import logging
> flake8: F401 'logging' imported but unused
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@28
PS2, Line 28: class EventProcessorUtils(object):
> flake8: E302 expected 2 blank lines, found 1
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@31
PS2, Line 31: @
> flake8: E301 expected 1 blank line, found 0
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@47
PS2, Line 47: T
> flake8: F821 undefined name 'TimeoutError'
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@49
PS2, Line 49: .
> flake8: E131 continuation line unaligned for hanging indent
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@53
PS2, Line 53:
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/14049/2/tests/util/event_processor_utils.py@62
PS2, Line 62:
> line has trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 12 Aug 2019 21:10:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/14049 )

Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..

IMPALA-8847: Ignore add partition events with empty partition list

Certain Hive queries like "alter table  add if not exists
partition ()" generate a add_partition event even if the
partition did not really exists. Such events have a empty partition list
in the event message which trips on the Precondition check in the
AddPartitionEvent. This causes event processor to go into error state.
The only way to recover is to issue invalidate metadata in such a case.

The patch adds logic to ignore such events.

Testing:
1. Added a test case which reproduces the issue. The test case works
after the patch is applied.

Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_event_processing.py
A tests/util/event_processor_utils.py
3 files changed, 163 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/3
--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Ignore add partition events with empty partition list

2019-08-12 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14049


Change subject: IMPALA-8847: Ignore add partition events with empty partition 
list
..

IMPALA-8847: Ignore add partition events with empty partition list

Certain Hive queries like "alter table  add if not exists
partition ()" generate a add_partition event even if the
partition did not really exists. Such events have a empty partition list
in the event message which trips on the Precondition check in the
AddPartitionEvent. This causes event processor to go into error state.
The only way to recover is to issue invalidate metadata in such a case.

The patch adds logic to ignore such events.

Testing:
1. Added a test case which reproduces the issue. The test case works
after the patch is applied.

Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_event_processing.py
A tests/util/event_processor_utils.py
3 files changed, 162 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/14049/2
--
To view, visit http://gerrit.cloudera.org:8080/14049
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I877ce6233934e7090cd18e497f748bc6479838cb
Gerrit-Change-Number: 14049
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8795 : Enable event polling by default in tests

2019-08-17 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/13922 )

Change subject: IMPALA-8795 : Enable event polling by default in tests
..

IMPALA-8795 : Enable event polling by default in tests

Most of the fixes for event processing are committed. This change
enables the feature by default for all the tests so that eventually we
can turn it on out of the box.

Testing done [WIP]:
1. Ran all the tests with the patch and confirmed that there are no test
failures
2. Ran all the tests with CDP build.

Change-Id: I7279349d4900e24fbcf558f290549496844ce138
---
M docker/catalogd/Dockerfile
M tests/common/environ.py
M tests/common/impala_test_suite.py
M tests/custom_cluster/test_event_processing.py
M tests/custom_cluster/test_hive_parquet_codec_interop.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_refresh_partition.py
M tests/util/event_processor_utils.py
9 files changed, 155 insertions(+), 167 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/13922/8
--
To view, visit http://gerrit.cloudera.org:8080/13922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7279349d4900e24fbcf558f290549496844ce138
Gerrit-Change-Number: 13922
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14082 )

Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3
..

IMPALA-8847: Fix test_event_processing.py on Hive-3

The test_event_processing fails on Hive-3 due to a error in the syntax of the
hive query for transactional tables. It was missed unfortunately when
IMPALA-8847 was merged.

Testing done:
Ran the test on both Hive-2 and Hive-3 and it passes now

Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
Reviewed-on: http://gerrit.cloudera.org:8080/14082
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M tests/custom_cluster/test_event_processing.py
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
Gerrit-Change-Number: 14082
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..


Patch Set 9: Code-Review+2

carrying forward Bharath's +2 from earlier.


--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 16 Aug 2019 22:14:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13932 )

Change subject: IMPALA-8661 : Add randomized tests to stress 
MetastoreEventsProcessor
..

IMPALA-8661 : Add randomized tests to stress MetastoreEventsProcessor

This change adds a new stress test for MetastoreEventsProcessor. This
test randomly executes hive queries to generate a lot of events. The
event processor is invoked at random intervals so that a variable batch
of events is processed everytime. After each batch is processed, the
test checks the status of events processor. By default, on CDH builds
the test is configured to run with 4 concurrent Hive clients and each
of the client runs 50 random Hive queries. These defaults can be
overridden by passing system properties using maven command arguments
"-DnumClients" and "-DnumQueriesPerClients". Additionally, the test
also creates impala clients which keep issuing refresh table commands
on the test databases to make sure that eventProcessor is doing some
real work rather than invalidating/refreshing tables which are
already incomplete.

This test is added as a junit test and uses Hive JDBC to issue the sqls.
This is much faster than the end-to-end python test which issues each
hive query in a separate beeline sessions which re-establishes the
connection every time.

Notes:
1. Ran the test with defaults. It generates about 500 events
and runs for close to 4.5 min. This can be changed to a lower
value if we see significant increased delay in the test job runtimes.
3. On CDP builds the concurrent hive queries run very slow due to
container provisioning time on the minicluster. I have left this as a
TODO to investigate. The test runs in single threaded mode with
increased number of queries when running against Hive-3

Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Reviewed-on: http://gerrit.cloudera.org:8080/13932
Tested-by: Impala Public Jenkins 
Reviewed-by: Vihang Karajgaonkar 
---
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
A 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
A fe/src/test/java/org/apache/impala/testutil/HiveJdbcClientPool.java
M fe/src/test/java/org/apache/impala/testutil/ImpalaJdbcClient.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
A fe/src/test/java/org/apache/impala/util/RandomHiveQueryRunner.java
6 files changed, 1,598 insertions(+), 24 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Vihang Karajgaonkar: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/13932
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8c85b83efd4f56b5ae0e8d1dc6a2ee2feb6721ce
Gerrit-Change-Number: 13932
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8847: Fix test event processing.py on Hive-3

2019-08-16 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14082 )

Change subject: IMPALA-8847: Fix test_event_processing.py on Hive-3
..


Patch Set 2: Code-Review+2

Carrying forward Csaba's +2 from before


--
To view, visit http://gerrit.cloudera.org:8080/14082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6507540bfbc0d131a061865ed9ed94792ccfa758
Gerrit-Change-Number: 14082
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 16 Aug 2019 22:16:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8880: Disable EventsProcessorStressTest

2019-08-21 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14117


Change subject: IMPALA-8880: Disable EventsProcessorStressTest
..

IMPALA-8880: Disable EventsProcessorStressTest

EventsProcessorStressTest is flaky. The root-cause of this is MAPREDUCE-6441 
which is
unavailable in the component builds we use in the minicluster. The test will be 
renabled
once we port MAPREDUCE-6441 to Hive-3 and Hive-2 minicluster

Testing:
1. Ran the test on hive-2 and confirmed its being ignored

Change-Id: I0be247e01d5047dbd74702fb43d4d038d8ccf758
---
M 
fe/src/test/java/org/apache/impala/catalog/events/EventsProcessorStressTest.java
1 file changed, 2 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/17/14117/2
--
To view, visit http://gerrit.cloudera.org:8080/14117
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0be247e01d5047dbd74702fb43d4d038d8ccf758
Gerrit-Change-Number: 14117
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] WIP IMPALA-8851 : Do not throw authorization exception in drop if exists queries

2019-08-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14121 )

Change subject: WIP IMPALA-8851 : Do not throw authorization exception in drop 
if exists queries
..


Patch Set 3:

(8 comments)

Mostly looks good. Have some questions and left some suggestions below.

http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2654
PS3, Line 2654:   public boolean tableExists(TableName tblName) {
Missing java doc for the two new methods


http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java:

http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropDbStmt.java@77
PS3, Line 77: dbName_
What happens when we register a privilege for a db which doesn't exist? The 
initial thought was that user needs to have ANY privilege on server to list 
databases


http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java:

http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@86
PS3, Line 86: registerFnPriv
Is this registering the priv twice if the function exists and if exists clause 
it not provided?


http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@97
PS3, Line 97: !hasSignature() && db != null && db.getFunctions(
: desc_.functionName()).isEmpty()
Can we make this section cleaner by creating a method which checks for 
existance of the function
Eg.

boolean fnExists = doesFunctionExist(db, Reference errMsg);
if (!fnExists && !ifExists_) {
  // fucntion does not exist and if exists is not provided
  // throw new AnalysisException(errMsg.get());
} else if(fnExists && ifExists) {
  registerFnPriv(analyzer, Privilege.DROP);
} else if (!fnExists && ifExists) {
  registerFnPriv(analyzer, Privilege.ANY);
}


http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropFunctionStmt.java@119
PS3, Line 119: void
this method can be private.


http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java:

http://gerrit.cloudera.org:8080/#/c/14121/3/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@116
PS3, Line 116: onTable(dbName_, getTbl())
What is the behavior when the table doesn't exist and a privilege request is 
registered on that object?


http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py
File tests/authorization/test_owner_privileges.py:

http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py@149
PS3, Line 149: test_db
is this argument needed?


http://gerrit.cloudera.org:8080/#/c/14121/3/tests/authorization/test_owner_privileges.py@182
PS3, Line 182: _execute_drop_if_exists
Do you think we need a additional test which exercises the drop if exists from 
a user which has the required permissions and one which was not a owner of the 
object previously?



--
To view, visit http://gerrit.cloudera.org:8080/14121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a
Gerrit-Change-Number: 14121
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Mon, 26 Aug 2019 18:36:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8579: Ignore trivial alter table/partition events.

2019-08-26 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14145 )

Change subject: IMPALA-8579: Ignore trivial alter table/partition events.
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@944
PS1, Line 944: isTrivialAlterTableEvent
may be a more boring name like canBeSkipped()? :)


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@945
PS1, Line 945: it is a trivial alter event
this log could be more descriptive since most likely a reader of that statement 
would not understand why this event was ignored. May be something like, 
"Skipping this event since it only alters some table properties which can be 
ignored".


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1023
PS1, Line 1023: !tableAfter_.equals(tableBefore_)
Do we need this check? If not, you can change this to a static util method 
which takes in two maps for the beforeProperties and afterProperties


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1030
PS1, Line 1030: tableAfter_.getParameters().put(property,
  :   tableBefore_.getParameters().get(property));
I think we should not change the state of the table object from the event. You 
could either create a copy and operate on it or use a try .. finally block to 
restore the original values of the table properties. This may be an issue today 
but we can run into weird bugs in the future if we decide to make use of these 
objects to update the catalog state directly in the future.


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1368
PS1, Line 1368: getPropertiesToIgnore
This list could be a static final Immutable list since we don't expect it to 
change from event type


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1569
PS1, Line 1569: !partitionAfter_.equals(partitionBefore_)
Do we need this check?


http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/14145/1/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@160
PS1, Line 160: alterPropertiesToIgnore_
Another reason to use a static final constant which can be reused. Such 
implementations are buggy since the main code could change while this test 
keeps passing.



--
To view, visit http://gerrit.cloudera.org:8080/14145
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I01a59d5170accc014f76f14eb526d96ddcf61f76
Gerrit-Change-Number: 14145
Gerrit-PatchSet: 1
Gerrit-Owner: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 27 Aug 2019 03:11:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8851: Do not throw authorization exception in drop if exists queries

2019-08-27 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14121 )

Change subject: IMPALA-8851: Do not throw authorization exception in drop if 
exists queries
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14121/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/14121/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2659
PS7, Line 2659: getFqTableName
I would have preferred a precondition check here instead of assuming what the 
caller wants since based on the result of this call, authorization decisions 
are being made.


http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java:

http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@117
PS6, Line 117: tableName_
> I created a fix inside tableExists().
Sorry for bugging you on this. Adding the check in the tableExists won't be 
right if dbName_ here is not default. For instance if dbName_ = foo and 
tblName_.tblName = testTbl then this call will look up default.testTbl

Also, does the case matter. I think it is important to make sure that this call 
is doing the right thing otherwise a non-privileged user can drop the table.



--
To view, visit http://gerrit.cloudera.org:8080/14121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a
Gerrit-Change-Number: 14121
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 27 Aug 2019 22:33:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8851: Do not throw authorization exception in drop if exists queries

2019-08-27 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14121 )

Change subject: IMPALA-8851: Do not throw authorization exception in drop if 
exists queries
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@2659
PS6, Line 2659: tblName
I was trying out your patch locally and I found a bug in this code which can 
lead to a privilege escalation problem. If the tblName is not fully qualified 
in this method this will always return false. Can you add a Precondition check 
to make sure that tblname is fully qualified here?

Not sure why it was not caught in the tests.


http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
File fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java:

http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@115
PS6, Line 115: dbName_, getTbl()
> I think this logic is faulty and will fail in the following scenario.
Synced up with Csaba offline. If the user2 has privileges on t2 within the db 
the show tables command skips listing of t1. Hence a ANY privilege on db is not 
a suitable representative to know whether user has permissions to see the 
existance of t1. In such a case, the current approach makes sense to me. Thanks 
for clarification.


http://gerrit.cloudera.org:8080/#/c/14121/6/fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java@117
PS6, Line 117: tableName_
I noticed that the tableName_ here is not fully qualified for some reason. We 
should pass TableName(dbName, getTbl()) here.



--
To view, visit http://gerrit.cloudera.org:8080/14121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iba068935e5da92d71e16e2321afdb8e7b781086a
Gerrit-Change-Number: 14121
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Tue, 27 Aug 2019 21:10:43 +
Gerrit-HasComments: Yes


<    1   2   3   4   5   6   7   8   9   10   >