[Impala-ASF-CR] IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT FROM in HiveSqlWriter

2016-09-12 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4100,4112: Qgen: Replace EXTRACT UDF + IS [NOT] DISTINCT 
FROM in HiveSqlWriter
..


Patch Set 3:

> > Have we considered added some basic sanity tests for the query
 > generator?
 > 
 > I've got a few directed unit tests for a single area and plan to
 > add more as I add features.

Thanks for the quick code review! I start working on adding some unit tests 
also.

-- 
To view, visit http://gerrit.cloudera.org:8080/4357
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3922ca61af59ecd2899c911b1a03e11ab5c26e11
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#3).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
3 files changed, 61 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4011/2//COMMIT_MSG
Commit Message:

PS2, Line 7: IMPALA-3980: qgen: re-enable Hive as a target database
> I prefer commit messages that, after the bug, speak of the code section cha
Done


PS2, Line 19: * Hive integration tested locally by invoking the data generator 
via the command:
: 
: ./data-generator.py \
: --db-name=functional \
: --use-hive \
: --min-row-count=50 \
> While it makes the commit message have more lines, it is more readable to w
Done


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cli_options.py
File tests/comparison/cli_options.py:

PS2, Line 150:   group.add_argument('--hive-host', default=DEFAULT_HIVE_HOST,
 :   help="The name of the host running the HS2")
 :   group.add_argument("--hive-port", default=DEFAULT_HIVE_PORT, 
type=int,
 :   help="The port of HiveServer2")
> Why not derive these defaults from symbols imported from cluster.py? (The c
Done


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cluster.py
File tests/comparison/cluster.py:

PS2, Line 166: class MiniCluster(Cluster):
> This changes the MiniCluster constructor interface for all callers. It also
Done


PS2, Line 178:   def _init_local_hadoop_conf_dir(self):
> Thanks for doing the cleanups like this one.
No problem! It makes the code a lot easier to run locally.


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/data_generator.py
File tests/comparison/data_generator.py:

PS2, Line 101:   def populate_db(self, table_count, postgresql_conn=None, 
is_hive=False)
> We don't use variable names of the format "isHive" typically, so can you ch
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#4).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
3 files changed, 61 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4011/3/tests/comparison/cluster.py
File tests/comparison/cluster.py:

PS3, Line 56: DEFAULT_HIVE_PORT = 11050
> This should be reverted back to 11050 to match what's already in use in the
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 4:

> I'd like you to run the small suite of cluster tests, but you can't
 > yet until you rebase on top of https://gerrit.cloudera.org/#/c/4404/
 > I realize there aren't that many, and they may not find anything,
 > but it's a habit we need to start. Thanks.

Will do, looks like your patch just got merged. I will rebase my patch and then 
run the tests. Will update this review once I'm done.

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-13 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 4:

> > I'd like you to run the small suite of cluster tests, but you
 > can't
 > > yet until you rebase on top of https://gerrit.cloudera.org/#/c/4404/
 > > I realize there aren't that many, and they may not find anything,
 > > but it's a habit we need to start. Thanks.
 > 
 > Will do, looks like your patch just got merged. I will rebase my
 > patch and then run the tests. Will update this review once I'm
 > done.

@Michael, rebased this patch and ran all the units tests (test_cluster.py and 
test_query_objects.py). All tests passed.

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.

2016-09-13 Thread Anonymous Coward (Code Review)
davy...@163.com has posted comments on this change.

Change subject: IMPALA-4028: Trim sentry config file path spaces while impala 
start.
..


Patch Set 3:

We have developed a configuration management system through which command line 
parameters can be set.
Thank you for your advice. I will use quotes to show the file name path in the 
error message.

-- 
To view, visit http://gerrit.cloudera.org:8080/4309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: davy...@163.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4028: Trim sentry config file path spaces while impala start.

2016-09-13 Thread Anonymous Coward (Code Review)
Hello Lars Volker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4309

to look at the new patch set (#4).

Change subject: IMPALA-4028: Trim sentry config file path spaces while impala 
start.
..

IMPALA-4028: Trim sentry config file path spaces while impala start.

When sentry config file path is wrong input which end contains spaces, 
impala start up failed.

Trimming sentry config file path spaces and checking file again while
the original file has been checked failed will solve this problem.

Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
---
M fe/src/main/java/com/cloudera/impala/authorization/SentryConfig.java
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/4309/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: davy...@163.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-14 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4419

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and does not allow any operator that operates 
over
multiple columns.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around it's "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker inside an Impala Docker 
container

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_create_join_clause.py
4 files changed, 110 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-14 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 5:

> Uploaded patch set 5: Patch Set 4 was rebased.

@Michael, not sure what happened there, but I just pushed the rebased version.

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-14 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#2).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and does not allow any operator that operates 
over
multiple columns.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around it's "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker inside an Impala Docker 
container

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_create_join_clause.py
4 files changed, 110 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-14 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 2:

> Uploaded patch set 2: Commit message was updated.

@Michael, full description of why this patch is necessary and what it changes 
is in the commit message.

-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-16 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#6).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
4 files changed, 66 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-16 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 6:

> Uploaded patch set 6.

Hey Michael,

Apologies, yes you are correct, the tests were failing with this patch. It 
seems my dev setup wasn't fully setup properly. Originally, I was trying to 
avoid building Impala fully, so I made some modifications to get the Python 
code to run locally. The tests were passing locally, but I realize this 
probably isn't a great approach if I want to reliably run these unit tests. 
I've set up a proper Impala dev environment on a remote machine. Once I did 
that I was able to re-produce the error. The hive-default.xml file under 
fe/src/test/resources/ is not a valid XML file. It was missing some element 
terminators. Once I fixed those the tests starting working. I have updated the 
patch.

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-16 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#7).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME
* Fixed fe/src/test/resources/hive-default.xml so that it is a valid XML file, 
it was
missing a few element terminators that cause an exception in the cluster.py file

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
4 files changed, 66 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-16 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 7:

> > The hive-default.xml file under
 > > fe/src/test/resources/ is not a valid XML file. It was missing
 > some
 > > element terminators. Once I fixed those the tests starting
 > working.
 > > I have updated the patch.
 > 
 > Thanks, can you update the commit message as well?

Updated

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4856: Port ImpalaInternalService to KRPC

2017-03-01 Thread Anonymous Coward (Code Review)
Anonymous Coward #168 has posted comments on this change.

Change subject: IMPALA-4856: Port ImpalaInternalService to KRPC
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5888/1/be/src/service/impala-internal-service.cc
File be/src/service/impala-internal-service.cc:

Line 66: 
sync callback ? AddData func may block some time that will cause  many work 
thread block here so bad . I think should move this task to Business processing 
thread, and it decided when to call callback.
Like fbthrift ThriftServer side task processing mechanism?Execute the callback 
function at the appropriate time


-- 
To view, visit http://gerrit.cloudera.org:8080/5888
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I95229290566a8ccffd80ed2d74c1c57cf1479238
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Henry Robinson 
Gerrit-Reviewer: Anonymous Coward #168
Gerrit-HasComments: Yes


[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.

2017-04-06 Thread Anonymous Coward (Code Review)
Anonymous Coward #290 has posted comments on this change.

Change subject: Impala and Hive have slightly different SQL they accept.
..


Patch Set 1:

Do not use the term 'compatible' or imply it.

https://wiki.apache.org/hadoop/Defining%20Hadoop#Compatibility

Some products have been released that have been described as "compatible" with 
Hadoop, even though parts of the Hadoop codebase have either been changed or 
replaced. The Apache™ Hadoop® developer team are not a standards body: they do 
not qualify such (derivative) works as compatible. Nor do they feel constrained 
by the requirements of external entities when changing the behavior of Apache 
Hadoop software or related Apache software.

Nothing is compatible with Apache Hive other than Apache Hive.

-- 
To view, visit http://gerrit.cloudera.org:8080/6567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #290
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR](asf-site) Impala and Hive have slightly different SQL they accept.

2017-04-06 Thread Anonymous Coward (Code Review)
Anonymous Coward #290 has posted comments on this change.

Change subject: Impala and Hive have slightly different SQL they accept.
..


Patch Set 1:

The issue we (HivePMC) are trying dealing with is there are several entities 
with 
copies of Hive code, linking to Hive, or both and at the same time claiming 
'Apache Hive Compatibility'. 
As I explained 'Apache Hive Compatibility' is not a thing they can assert, 

1. Assuming you are using an unaltered Apache Hive metastore you could say. 
"Impala stores it's data in Hive Metastore."
2. Assuming you are using an unaltered apache hive server 2 you could say. " 
Impala's public facing interfaces mirror that of Hive X.Y"
3. You could say, "Impala's SQL dialect is very similar to that of Hive SQL." 
or "Many Apache Hive queries will run on Impala unaltered", 

First, use Defining Hadoop document as a guideline. Second, as a guideline, 
imagine Hive was some large commercial database 'DB2orgres 10Gee', you would be 
hard pressed to find any other entity software claiming compatibility with that 
thing.

To summarize: I appreciate your rapid response. I do realize it is a tag line 
so you want to make it 
short and to the point (while of course making Impala sound as awesome as 
possible). In any case, I do not want to tell you
what to say, use your best judgement. I hope that helps. The changes are 
appreciated.

-- 
To view, visit http://gerrit.cloudera.org:8080/6567
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4e38dc15c31dbb3e297e1d3f9e8251a1fa22d1a7
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: asf-site
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #290
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-07 Thread Anonymous Coward (Code Review)
sakinape...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7110

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..

IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

This change introduces a new rule to merge disjunct equality
predicates into an IN predicate. As with every rule being applied
bottom up, the rule merges the leaf OR predicates into an in predicate
and subsequently merges the OR predicate to the existing IN predicate

Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A 
fe/src/main/java/org/apache/impala/rewrite/CoalesceEqualityDisjunctsToInRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
4 files changed, 156 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7110/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sakinape...@cloudera.com


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-08 Thread Anonymous Coward (Code Review)
sakinape...@cloudera.com has uploaded a new patch set (#2).

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..

IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

This change introduces a new rule to merge disjunct equality
predicates into an IN predicate. As with every rule being applied
bottom up, the rule merges the leaf OR predicates into an in predicate
and subsequently merges the OR predicate to the existing IN predicate
Patch also include changes in response to alex's review comments

Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
4 files changed, 144 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7110/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sakinape...@cloudera.com
Gerrit-Reviewer: Alex Behm 


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-09 Thread Anonymous Coward (Code Review)
sakinape...@cloudera.com has posted comments on this change.

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..


Patch Set 2:

(21 comments)

Responding to comments...

http://gerrit.cloudera.org:8080/#/c/7110/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 179: 
> Not really sure what this predicate does. Seems very specific, so probably 
changed it to  expr with equality condition and literal on the right.  Generic 
enough to keep it here.


Line 183:   return arg instanceof BinaryPredicate
> What's special about these literals? Why not all literals, i.e. isLiteral()
Done. bool and null has only few values. however if there are predicates then 
we dont want to restrict rewriting. hence chaged to isLiteral


http://gerrit.cloudera.org:8080/#/c/7110/1/fe/src/main/java/org/apache/impala/rewrite/CoalesceEqualityDisjunctsToInRule.java
File 
fe/src/main/java/org/apache/impala/rewrite/CoalesceEqualityDisjunctsToInRule.java:

Line 15
> several typos
Done


Line 16
> please follow the format of other rules to list examples
Done


Line 19
> CoalesceEqDisjunctsToInRule (a little shorter)
Done


Line 21
> remove, no logging please, too expensive
Done


Line 27
> very expensive, remove
Done


Line 28
> single line if
Done


Line 34
> single line
Done


Line 39
> single line
Done


Line 46
> Try to be consistent with rest of code base:
Done


Line 48
> whitespace (same below)
Done


Line 60
> else if seems clearer
Done


Line 64
> single line if
Done


Line 67
> Don't we need otherPred to be a BinaryPredicate with an EQ condition?
Added the condition in Expr.java


Line 69
> Predicates might not be normalized, i.e. you could have
Agree. do we capture this effort in a separate jira?


Line 73
> What is this check trying to do? It doesn't seem necessary.
Redundant. Trying to check if the elements in the IN clause and the candidate 
predicate literal is of same type or not. Removed.


Line 82
> We might want to extend NormalizeBinaryPredicates to make dealing with thes
Yes. added a combo testcase with normalized predicate rule for simple case.


Line 105
> What is this trying to do?
Redundant. removed.


Line 109
> Why not leftChild1.isLiteral()?
Done


http://gerrit.cloudera.org:8080/#/c/7110/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 80:   public void TestBetweenToCompoundRule() throws AnalysisException {
> nit: move to bottom
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sakinape...@cloudera.com
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: sakinape...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-12 Thread Anonymous Coward (Code Review)
sakinape...@cloudera.com has uploaded a new patch set (#3).

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..

IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

This change introduces a new rule to merge disjunct equality
predicates into an IN predicate. As with every rule being applied
bottom up, the rule merges the leaf OR predicates into an in predicate
and subsequently merges the OR predicate to the existing IN predicate
Patch also include changes in response to alex's review comments to
normalize the binary predicates.

Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
5 files changed, 172 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7110/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sakinape...@cloudera.com
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: sakinape...@cloudera.com


[Impala-ASF-CR] IMPALA-5280: Coalesce chains of OR conditions to an IN predicate

2017-06-12 Thread Anonymous Coward (Code Review)
sakinape...@cloudera.com has posted comments on this change.

Change subject: IMPALA-5280: Coalesce chains of OR conditions to an IN predicate
..


Patch Set 3:

(17 comments)

additional tests and changes to normalize binary predicate rule

http://gerrit.cloudera.org:8080/#/c/7110/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 180:   public final static com.google.common.base.Predicate 
IS_EXPR_EQ_LITERAL_PREDICATE =
> long line
Done


http://gerrit.cloudera.org:8080/#/c/7110/1/fe/src/main/java/org/apache/impala/rewrite/CoalesceEqualityDisjunctsToInRule.java
File 
fe/src/main/java/org/apache/impala/rewrite/CoalesceEqualityDisjunctsToInRule.java:

Line 69
> I think this particular case is easy enough to fix in this JIRA. It's a sim
Adding code in the rule class.


http://gerrit.cloudera.org:8080/#/c/7110/2/fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
File fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java:

Line 12: 
> Suggest modifications for brevity and grammar:
Done


Line 16:  * Examples:
> add an example where an equality predicate is merged into an IN predicate
Done


Line 32: Expr inAndOtherExpr = rewriteInAndOtherExpr(child0, child1);
> What about IN OR IN? That can be produced by (a = 1 or a = 2) or (a = 3 or 
Done. Added code to merge two IN predicates.


Line 38: return expr;
> This comment should describe the inputs and outputs more clearly, in partic
Done


Line 52: }
> extra space before "return"
Done


Line 54:   inPred = (InPredicate) child1;
> can get rid of these empty lines imo
Done


Line 55:   otherPred = child0;
> Our rewriting framework requires that we return a new InPredicate here. See
Added code to instantiate new class


Line 61: // other predicate can be OR predicate or IN predicate
> Same here. Comment needs polish to describe inputs and outputs.
Done


Line 64: if (Expr.IS_EXPR_EQ_LITERAL_PREDICATE.apply(otherPred))
> single lines?
Done


Line 72:   }
> Seems more concise to inline these function calls, and get rid of the extra
Done


Line 78:   private Expr rewriteEqEqPredicate(Expr child0, Expr child1) {
> To keep things simple, I think we should also require that rightChild1 is a
removed this as it is covered by the IS_EXPR_EQ_LITERAL_PREDICATE function


http://gerrit.cloudera.org:8080/#/c/7110/2/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 392: RewritesOk("int_col * 3 = 6 or int_col * 3 = 9 or int_col * 3 = 
12",
> add parenthesis to make the AND/OR precedence explicit
Done


Line 394: 
> remove blank lines, these 3 test cases belong together logically, so it's g
Done


Line 403: edToInrule, "int_col * 3 IN (6, 9) OR int_col * 3 <= 12");
> add negative tests to show the non-obvious cases in which the rewrite is no
Done


Line 411: RewritesOk("int_col in (1,2) or int_col = 3", edToInrule,
> int_col in (1,2) or int_col in (3,4) should also work
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7110
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If02396b752c5497de9a92828c24c8062027dc2e2
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: sakinape...@cloudera.com
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: sakinape...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4028: Improve message for improper Sentry config file path to make extra spaces visible.

2016-09-19 Thread Anonymous Coward (Code Review)
Hello Lars Volker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4309

to look at the new patch set (#5).

Change subject: IMPALA-4028: Improve message for improper Sentry config file 
path to  make extra spaces visible.
..

IMPALA-4028: Improve message for improper Sentry config file path to 
make extra spaces visible.

When Sentry config file is incorrectly entered,the end contains spaces,
impala starts up failed.

Use quotes to embrace the file path in the error message that will help 
to identify the redundant spaces in the file path.

Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
---
M fe/src/main/java/com/cloudera/impala/authorization/SentryConfig.java
1 file changed, 3 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/4309/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a76b9e4236caa3f2088fba8a9cf0236fced2634
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: davy...@163.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-20 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 7:

> This needs a committer's look.

@Michael, do we know of a committer that has time to look at this patch?

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-20 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#3).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle, if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so its best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around it's "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_create_join_clause.py
4 files changed, 110 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-21 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#4).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_create_join_clause.py
4 files changed, 117 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-21 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 4:

(6 comments)

@Michael, comments addressed.

http://gerrit.cloudera.org:8080/#/c/4419/3//COMMIT_MSG
Commit Message:

PS3, Line 35: subtle;
> Nit: this is a comma splice. You can change the comma to something like a c
Done


PS3, Line 41: it's
> Nit: "it's"
Done


PS3, Line 45: its 
> Nit: "its" :)
Done


http://gerrit.cloudera.org:8080/#/c/4419/3/tests/comparison/query_generator.py
File tests/comparison/query_generator.py:

Line 1364:  relational_col_types: Can be used to restrict the signature 
of the chosen
> Add documentation about the signatures parameter.
Done


http://gerrit.cloudera.org:8080/#/c/4419/3/tests/comparison/query_profile.py
File tests/comparison/query_profile.py:

PS3, Line 661: 
 : PROFILES = [var for var in locals().values()
 : if isinstance(var, type)
> Maybe use an or expression for these two conditions and remove the elif?
Re-wrote this as a single list comprehension


http://gerrit.cloudera.org:8080/#/c/4419/3/tests/comparison/tests/hive/test_create_join_clause.py
File tests/comparison/tests/hive/test_create_join_clause.py:

PS3, Line 31:   class FakeQueryProfile(HiveProfile):
> 1. I prefer "Fake" instead of "Mock". I prefer the use of "mock" only when 
Sounds reasonable, updated


-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-21 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#5).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 117 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-21 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4419/4/tests/comparison/query_generator.py
File tests/comparison/query_generator.py:

PS4, Line 1369: signature
> s/functions/signatures/
Done


http://gerrit.cloudera.org:8080/#/c/4419/4/tests/comparison/tests/hive/test_create_join_clause.py
File tests/comparison/tests/hive/test_create_join_clause.py:

PS4, Line 25: 
> Nit: this test is hive-specific, so hive should be in the test name.
Done


PS4, Line 31: 
> FakeHiveQueryProfile
Done


PS4, Line 55: 
> Nit: This isn't a mock object; it's a real QueryGenerator. I would just cal
Done


PS4, Line 60: 
> Any thoughts on maybe parametrizing this test to test with different types?
I think its worth doing, but would rather do it in a separate test that isn't 
Hive specific. This test is solely responsible for checking if equality joins 
are made for Hive or not. I think testing if the join clause is correctly made 
with different column types would go in a separate test.

I'm working on a few unit tests for create_relational_join_condition and 
_create_boolean_func_tree that are not Hive specific and will incorporate your 
suggestions there (will be in a separate Gerrit review).


-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-22 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4419/5/tests/comparison/query_generator.py
File tests/comparison/query_generator.py:

Line 1436: if self.profile.only_use_equality_join_predicates():
> I don't think this whole patch is necessary. We just need only_use_equality
That was my original approach, but this parameter does not fully work. If 
and_or_count and relational_count are drained to 0, then the code hits line 
1453 and any function could be chosen (including say a < or > operator), which 
could cause Hive to break.

I tried to add this same check to line 1453 but that didn't work because it 
only allows Equals.signatures, which eliminates the possibility of having any 
function in the JOIN clause (e.g ABS, FLOOR, CEIL, etc.)


-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4011

to look at the new patch set (#8).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME
* Fixed fe/src/test/resources/hive-default.xml so that it is a valid XML file, 
it was
missing a few element terminators that cause an exception in the cluster.py file

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
M tests/comparison/db_connection.py
5 files changed, 91 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4011/7/tests/comparison/cluster.py
File tests/comparison/cluster.py:

Line 100: if result is None:
> I think it's a little confusing to check if default is None here.
Done


Line 185: for file_name in ["hive-site.xml"]:
> I'm not sure that modifying the minicluster here is the right thing to do. 
I run the Hive tests against a local instance of Hive, so setting this to 
HIVE_CONF_DIR makes development for me much. I think it will also be necessary 
to run the RQG against Hive in general as Hive-on-RQG should be able to accept 
a custom hive-default.xml file.

I've tested this change against Impala (ran the data_generator, 
discpreancy_checker, and Leopard) and it works fine. There were a few failures 
because now the the code uses fe/src/test/resources/hive-default.xml which had 
some invalid xml clauses.

If you think this will cause problems for Impala, I can revert it. However, in 
that case I would like to create a new Cluster object that can accept generic 
hive-default.xml files.


http://gerrit.cloudera.org:8080/#/c/4011/7/tests/comparison/data_generator.py
File tests/comparison/data_generator.py:

Line 92: self.cluster = None
> How about adding a variable here called, for example, db_engine that can be
Done


Line 153:   with self.cluster.impala.cursor(db_name=self.db_name) as cursor:
> We don't need to compute stats if we are using hive?
Thats a good point, updated the patch to do this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4011

to look at the new patch set (#9).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME
* Fixed fe/src/test/resources/hive-default.xml so that it is a valid XML file, 
it was
missing a few element terminators that cause an exception in the cluster.py file

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
M tests/comparison/db_connection.py
5 files changed, 92 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4011

to look at the new patch set (#10).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME
* Fixed fe/src/test/resources/hive-default.xml so that it is a valid XML file, 
it was
missing a few element terminators that cause an exception in the cluster.py file

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
M tests/comparison/db_connection.py
5 files changed, 92 insertions(+), 25 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..


Patch Set 8:

@Taras, comments addressed, patch updated and re-tested.

-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4419

to look at the new patch set (#6).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 137 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-23 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 6:

(1 comment)

@Taras, comments addressed, new changes have been tested.

http://gerrit.cloudera.org:8080/#/c/4419/5/tests/comparison/query_generator.py
File tests/comparison/query_generator.py:

Line 1436: signature = 
self.profile.choose_func_signature(func_class.signatures())
> Ok, I see. This means that currency only_use_equality_join_predicates is no
I made the fix for only_use_equality_join_predicates, but I moved the check 
outside the _create_boolean_func_tree function. I think its simpler if 
_create_boolean_func_tree just operates on a given list of signatures, and the 
caller decides what signatures those should be.

I defined a list of non-equality join predicates as (GreaterThan, 
GreaterThanOrEquals, In, IsNotDistinctFrom, IsNotDistinctFromOp, LessThan, 
LessThanOrEquals, NotEquals, NotIn). So basically, the check will exclude any 
of these functions from the func tree.

I left the get_allowed_join_signatures method because Hive has additional 
restrictions in addition to only allowing equi-joins (see commit message for a 
detailed explanation).


-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4419

to look at the new patch set (#7).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 137 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-3980: qgen: re-enable Hive as a target database

2016-09-23 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4011

to look at the new patch set (#11).

Change subject: IMPALA-3980: qgen: re-enable Hive as a target database
..

IMPALA-3980: qgen: re-enable Hive as a target database

Changes:

* Added hive cli options back in (removed in commit "Stress test: Various 
changes")
* Modifications so that if --use-hive is specified, a Hive connection is 
actually created
* A few minor bug fixes so that the RQG can be run locally
* Modified MiniCluster to use HADOOP_CONF_DIR and HIVE_CONF_DIR rather than a 
hard-coded
file under IMPALA_HOME
* Fixed fe/src/test/resources/hive-default.xml so that it is a valid XML file, 
it was
missing a few element terminators that cause an exception in the cluster.py file

Testing:

* Hive integration tested locally by invoking the data generator via the 
command:

./data-generator.py \
--db-name=functional \
--use-hive \
--min-row-count=50 \
--max-row-count=100 \
--storage-file-formats textfile \
--use-postgresql \
--postgresql-user stakiar

and the discrepancy checker via the command:

./discrepancy-checker.py \
--db-name=functional \
--use-hive \
--use-postgresql \
--postgresql-user stakiar \
--test-db-type HIVE \
--timeout 300 \
--query-count 50 \
--profile hive

* The output of the above two commands is essentially the same as the Impala 
output,
however, about 20% of the queries will fail when the discrepancy checker is run
* Regression testing done by running Leopard in a local VM running Ubuntu 
14.04, and by
running the discrepancy checker against Impala while inside an Impala Docker 
container

Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
---
M fe/src/test/resources/hive-default.xml
M tests/comparison/cli_options.py
M tests/comparison/cluster.py
M tests/comparison/data_generator.py
M tests/comparison/db_connection.py
5 files changed, 126 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/4011/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-24 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4419

to look at the new patch set (#8).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 140 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-24 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4419/7/tests/comparison/query_generator.py
File tests/comparison/query_generator.py:

Line 1214: self.profile.is_non_equality_join_predicate(sig.func)]
> This if statement should be above line 1211 and line 1211 should go into th
Done


PS7, Line 1216: join
> there should a 4 space indentation here
Done


PS7, Line 1356: allowed_si
> I think it would be clearer if this was renamed to allowed_signatures
Done


Line 1442: signature = 
self.profile.choose_relational_func_signature(relational_signatures)
> I like this. This means that we are not calling the only_use_equality_join_
Yeah, only_use_equality_join_predicates() is never called inside 
_create_boolean_func_tree() which I think makes more sense since 
_create_boolean_func_tree() should be agnostic to whether or not it is being 
used to create a join predicate.


-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4142: qgen: Hive does not support CTEs inside sub-query blocks

2016-09-24 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4534

Change subject: IMPALA-4142: qgen: Hive does not support CTEs inside sub-query 
blocks
..

IMPALA-4142: qgen: Hive does not support CTEs inside sub-query blocks

Changes:

Hive does not support WITH clauses inside sub-query blocks, the DefaultProfile 
actually
already has an option to disabled this called "use_nested_with" and the 
HiveProfile
returns False for this method. However, there is a bug in the usage of 
"use_nested_with";
it is not checked everytime a sub-query is created. Specifically, it is not 
checked when
an in-line view is created, or when the query within a WITH clause is created 
(so it's
possible to create WITH ... AS (WITH ...)). This patch fixes both of these 
issues since
Hive throws a SQL compile exception for either case.

Testing:

* Added new unit tests
* All unit tests pass
* Tested locally against Hive
* Tested against Impala via Leopard
* Tested against Impala via the discrepancy checker

Change-Id: Ie585bbbd88a32f502457a4ca9f2f54c7e1ade354
---
M tests/comparison/query_generator.py
A tests/comparison/tests/test_use_nested_with.py
2 files changed, 72 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/4534/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie585bbbd88a32f502457a4ca9f2f54c7e1ade354
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4101: qgen: Hive join predicates should only contains equality functions

2016-09-25 Thread Anonymous Coward (Code Review)
Hello Michael Brown,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4419

to look at the new patch set (#9).

Change subject: IMPALA-4101: qgen: Hive join predicates should only contains 
equality functions
..

IMPALA-4101: qgen: Hive join predicates should only contains equality functions

Background:

Hive only supports equi-joins in its JOIN clause, while Postgres and Impala 
support more
complex functions such as <, <=, >, >=, etc. This change modifies the
QueryGenerator._create_relational_join_condition and
QueryGenerator._create_boolean_func_tree methods to only construct equality join
conditions under certain conditions.

The _create_boolean_func_tree method is invoked via
QueryGenerator -> create_query -> _create_from_clause -> _create_join_clause ->
_create_relational_join_condition -> _create_boolean_func_tree. This method is 
invoked
when constructing the JOIN, WHERE, and HAVING clauses. It creates a tree of 
functions
that would typically be found in any of these clauses.

Changes:

The parameter "signatures" is added to the method _create_boolean_func_tree, 
and it lists
out all the allowed signatures the function is allowed to use. Previously, this 
list of
signatures was populated by calling _funcs_to_allowed_signatures(FUNCS), and if
"signatures" is not specified, then the code defaults back to the results of 
that method.
A new method in the DefaultProfile called get_allowed_join_signatures is 
introduced and
returns a list of function signatures that are allowed within a JOIN clause. The
DefaultProfile allows all given signatures, while the HiveProfile only allows 
for the
Equals and And functions, as well as any function that operates over only one 
column.
The reason for these restrictions is that Hive only allows equality joins, does 
not allow
OR operators in the join clause, and has some restrictions on functions that 
operate over
multiple different tables. This last restriction is somewhat subtle; if one 
side of the
equals operator contains a function that operates over two different tables, 
the other
side of the operator cannot contain either of those tables. While it is 
possible to have
functions that take in multiple input parameters, the inputs must be taken from 
specific
tables to prevent Hive from throwing a compile time exception. Adding support 
for this in
qgen code will require significant effort and modification to some core methods
(_create_relational_join_condition and _populate_func_with_vals), so it's best 
to disable
these for Hive altogether.

Note that the _create_boolean_func_tree still allows for OR operators due to 
some logic
around its "and_or_fill_ratio" variable. The plan is to fix this in a future 
patch that
specifically focuses on removing OR operators from Hive JOIN clauses.

Minor change to discrepancy_searcher so that the logs print out "Hive" instead 
of
"Impala" when running against Hive.

Testing:

* Added a new unit test that ensures the HiveProfile only returns equality joins
* Unit tests pass
* Tested against Hive locally
* Tested against Impala via Leopard
* Tested against Impala via the Discrepancy Checker

Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
---
M tests/comparison/discrepancy_searcher.py
M tests/comparison/query_generator.py
M tests/comparison/query_profile.py
A tests/comparison/tests/hive/test_hive_create_relational_join_condition.py
4 files changed, 139 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/4419/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4419
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibe8832a03cfa0d7ecc293ec6db6db2bcb34ab459
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: stak...@cloudera.com


[Impala-ASF-CR] IMPALA-4142: qgen: Hive does not support CTEs inside sub-query blocks

2016-09-26 Thread Anonymous Coward (Code Review)
stak...@cloudera.com has uploaded a new patch set (#2).

Change subject: IMPALA-4142: qgen: Hive does not support CTEs inside sub-query 
blocks
..

IMPALA-4142: qgen: Hive does not support CTEs inside sub-query blocks

Changes:

Hive does not support WITH clauses inside sub-query blocks, the DefaultProfile 
actually
already has an option to disable this called "use_nested_with" and the 
HiveProfile
returns False for this method. However, there is a bug in the usage of 
"use_nested_with";
it is not checked every time a sub-query is created. Specifically, it is not 
checked when
an in-line view is created, or when the query within a WITH clause is created 
(so it's
possible to create WITH ... AS (WITH ...)). This patch fixes both of these 
issues since
Hive throws a SQL compile exception for either case.

Testing:

* Added new unit tests
* All unit tests pass
* Tested locally against Hive
* Tested against Impala via Leopard
* Tested against Impala via the discrepancy checker

Change-Id: Ie585bbbd88a32f502457a4ca9f2f54c7e1ade354
---
M tests/comparison/query_generator.py
A tests/comparison/tests/test_use_nested_with.py
2 files changed, 72 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/34/4534/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4534
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie585bbbd88a32f502457a4ca9f2f54c7e1ade354
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stak...@cloudera.com
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Taras Bobrovytsky 


[Impala-ASF-CR] IMPALA-4042: count(distinct NULL) fails on a view

2016-10-03 Thread Anonymous Coward (Code Review)
Anonymous Coward #216 has posted comments on this change.

Change subject: IMPALA-4042: count(distinct NULL) fails on a view
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4600/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 734: if (smap == null || this instanceof NullLiteral) return result;
> This invalidates the comment above that when smap is null, this function is
I thought that NullLiteral would not require substitution and, hence, simply 
clone/return for NullLiteral. Could you explain when NullLiteral requires 
substitution, for me to have better understanding on the code?

In the meantime, let me follow source code to figure out where I can leverage 
preserveRootType.


-- 
To view, visit http://gerrit.cloudera.org:8080/4600
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf3b4511234e473e5b9548fbf3e97f333c9980f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #216
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4042: count(distinct NULL) fails on a view

2016-10-03 Thread Anonymous Coward (Code Review)
Anonymous Coward #216 has posted comments on this change.

Change subject: IMPALA-4042: count(distinct NULL) fails on a view
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4600/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 734: if (smap == null || this instanceof NullLiteral) return result;
> I don't think it's particularly relevant whether we need to substitute a Nu
Agreed w/ the design for Expr. I am updating the fix to use preserveRootType. 
Thank you for the clarification. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4600
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Icf3b4511234e473e5b9548fbf3e97f333c9980f1
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Yonghyun Hwang 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #216
Gerrit-Reviewer: Huaisi Xu 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4431: Add audit event log control mechanism to prevent disk overflow.

2016-11-06 Thread Anonymous Coward (Code Review)
davy...@163.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4971

Change subject: IMPALA-4431: Add audit event log control mechanism to prevent 
disk overflow.
..

IMPALA-4431: Add audit event log control mechanism to prevent disk overflow.

There is no limit to the number of audit event log files. When audit event log 
is enabled,
the growing number of log files will spill the disk space.

This patch adds checking and rotation mechanism on audit event log files to 
prevent
file number out of control which causes disk overflow.

Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
---
M be/src/common/init.cc
M be/src/common/logging.cc
M be/src/common/logging.h
3 files changed, 27 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4971/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com


[Impala-ASF-CR] IMPALA-4431: Add audit event log control mechanism to prevent disk overflow.

2016-11-06 Thread Anonymous Coward (Code Review)
davy...@163.com has uploaded a new patch set (#2).

Change subject: IMPALA-4431: Add audit event log control mechanism to prevent 
disk  overflow.
..

IMPALA-4431: Add audit event log control mechanism to prevent disk 
overflow.

There is no limit to the number of audit event log files. When audit 
event log is enabled,the growing number of log files will fill up the 
disk space.

This patch adds checking and rotation mechanism on audit event log 
files to prevent file number out of control which causes disk overflow.

Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
---
M be/src/common/init.cc
M be/src/common/logging.cc
M be/src/common/logging.h
3 files changed, 28 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/4971/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4971
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8c3229cbdb6275f969c15258c9ccab6efeb24368
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: davy...@163.com
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.

2016-11-07 Thread Anonymous Coward (Code Review)
Anonymous Coward #27 has posted comments on this change.

Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through 
grouping agg + outer join.
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1426: hasOjTest
Where is this function used?


http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

PS1, Line 851: public static  void removeDuplicates(List l)
Suggest to refactor this function a bit like below (if possible):

public static  List removeDuplicates(List l) {
  if (l == null) {
return l;
  } else {
return Lists.newArrayList(Sets.newHashSet(l));
  }
}

This makes this function thread-safe. The passed in list won't be modified, and 
thus can be safely shared among threads.


PS1, Line 853: List origList = Lists.newArrayList(l);
 : l.clear();
 : for (C expr: origList) if (!l.contains(expr)) l.add(expr);
Would it be better to use a Set instead of List?
Set s = Sets.newHashSet(l);
l.clear();
l.addAll(s);


-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #27
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3167: Fix assignment of WHERE-clause predicate through grouping agg + outer join.

2016-11-07 Thread Anonymous Coward (Code Review)
Anonymous Coward #27 has posted comments on this change.

Change subject: IMPALA-3167: Fix assignment of WHERE-clause predicate through 
grouping agg + outer join.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4960/1/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS1, Line 1558: evalByJoin
This variable name is a bit confusing to me. Is "evalAfterJoin" better?


-- 
To view, visit http://gerrit.cloudera.org:8080/4960
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I774d13a13ad1e8fe82512df98dc29983bdd232eb
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Anonymous Coward #27
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3398: Add docs to main Impala branch.

2016-11-14 Thread Anonymous Coward (Code Review)
Anonymous Coward #250 has posted comments on this change.

Change subject: IMPALA-3398: Add docs to main Impala branch.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5014/1/docs/generatingImpalaDoc.md
File docs/generatingImpalaDoc.md:

PS1, Line 52: 
: #Setting JAVA\_HOME
: 
: 
: Set your JAVA\_HOME environment variable to tell your computer 
where to find the Java executable file. For example, to set your JAVA\_HOME 
environment on Mac OS X when you the the 1.8.0\_101 version of the Java 
Development Kit (JDK) installed and you are using the Bash version 3.2 shell, 
perform the following steps:
: 
: 1. Edit your /Users/\/.bash\_profile file 
and add the following lines to the end of the file:
: 
: #Set JAVA_HOME
: 
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home
: export JAVA_HOME;
: 
:Where jdk1.8.0\_101.jdk is the version of JDK 
that you have installed. For example, if you have installed 
jdk1.8.0\_102.jdk, you would use that value instead.
:
: 2. Test to make sure you have set your JAVA\_HOME correctly:
: * Open a terminal window and type: $JAVA\_HOME/bin/java 
-version
: * Press return. If you see something like the following:
:   java version "1.5.0_16"
:   Java(TM) 2 Runtime Environment, Standard Edition (build 
1.5.0_16-b06-284)
:   Java HotSpot (TM) Client VM (build 1.5.0\_16-133, mixed 
mode, sharing)
:   
:   Then you've successfully set your JAVA\_HOME environment 
variable to the binary stored in 
/Library/Java/JavaVirtualMachines/jdk1.8.0\_101.jdk/Contents/Home. 
This section must be reviewed for accuracy of markdown tagging.


-- 
To view, visit http://gerrit.cloudera.org:8080/5014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8861e99adc446f659a04463ca78c79200669484f
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3973: optional 2nd and 3rd arguments for instr().

2017-01-03 Thread Anonymous Coward (Code Review)
Anonymous Coward #268 has posted comments on this change.

Change subject: IMPALA-3973: optional 2nd and 3rd arguments for instr().
..


Patch Set 2: Code-Review+1

Thanks for the doc update!

-- 
To view, visit http://gerrit.cloudera.org:8080/5589
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I17268bdb480230938f94559fe1eabe34ac2448b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Anonymous Coward #268
Gerrit-Reviewer: Zoltan Ivanfi 
Gerrit-HasComments: No


[Impala-ASF-CR] DOCS-1757

2017-01-03 Thread Anonymous Coward (Code Review)
ambreen.k...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5594

Change subject: DOCS-1757
..

DOCS-1757

Change-Id: Ibf5f387fa07cf988a99b9ede4066bfd2b27afe16
---
M docs/topics/impala_authorization.xml
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/5594/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5594
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibf5f387fa07cf988a99b9ede4066bfd2b27afe16
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: ambreen.k...@cloudera.com


[Impala-ASF-CR] Remove audience="Cloudera" from DITAVal, replace with audience="hidden".

2017-01-05 Thread Anonymous Coward (Code Review)
Anonymous Coward #250 has posted comments on this change.

Change subject: Remove audience="Cloudera" from DITAVal, replace with 
audience="hidden".
..


Patch Set 1: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I0771ccf912d8112194ad52a7fa76b092ea6cff72
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell 
Gerrit-Reviewer: Ambreen Kazi 
Gerrit-Reviewer: Anonymous Coward #250
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: John Russell 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4036: show create table outputs invalid SQL for partitioned tables with comments

2017-01-09 Thread Anonymous Coward (Code Review)
joemcdonn...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5648

Change subject: IMPALA-4036: show create table outputs invalid SQL for 
partitioned tables with comments
..

IMPALA-4036: show create table outputs invalid SQL for partitioned tables
with comments

For a table that has both a table comment and a partition specified,
"show create table" incorrectly outputs the comment before the partition.
This is not the correct order, and it results in an invalid SQL.

This transaction fixes the ordering (partition comes before comment) and
adds tests for this case.

Change-Id: Iccf9488e8b1d28fcaf3a40f935157c9dd792812e

Update code style of ToSqlTest.java

Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
3 files changed, 42 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5648/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: joemcdonn...@cloudera.com


[Impala-ASF-CR] IMPALA-4036: show create table outputs invalid SQL for partitioned table with comment

2017-01-09 Thread Anonymous Coward (Code Review)
joemcdonn...@cloudera.com has uploaded a new patch set (#2).

Change subject: IMPALA-4036: show create table outputs invalid SQL for 
partitioned table with comment
..

IMPALA-4036: show create table outputs invalid SQL for partitioned table with 
comment

For a table that has both a table comment and a partition specified,
"show create table" incorrectly outputs the comment before the partition.
This is not the correct order, and it results in an invalid SQL.

This transaction fixes the ordering (partition comes before comment) and
adds tests for this case.

Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
---
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
3 files changed, 42 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/5648/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I29a33cfd142b473997fdc3acfe3f0966bc7ed784
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: joemcdonn...@cloudera.com
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4739: ExprRewriter fails on HAVING clauses

2017-01-12 Thread Anonymous Coward (Code Review)
zams...@cloudera.com has posted comments on this change.

Change subject: IMPALA-4739: ExprRewriter fails on HAVING clauses
..


Patch Set 6:

(2 comments)

I realize this is already merged - simply learning how to use gerrit and I 
actually did have two comments.

http://gerrit.cloudera.org:8080/#/c/5662/6/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS6, Line 1065: ;
Nit: spurious semi-colon


http://gerrit.cloudera.org:8080/#/c/5662/6/testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test:

PS6, Line 67: sum(2 + id) > 1, sum(2 + id) >= 5
Looks like further opportunities for redundant subexpression elimination 
remain, although this one is a bit more difficult.  Is there already a JIRA to 
track this?


-- 
To view, visit http://gerrit.cloudera.org:8080/5662
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife74c61f549f620c42f74928f6474e8a5a7b7f00
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Marcel Kornacker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Taras Bobrovytsky 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: zams...@cloudera.com
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3977: TransmitData() should not block

2017-01-19 Thread Anonymous Coward (Code Review)
Anonymous Coward #168 has posted comments on this change.

Change subject: IMPALA-3977: TransmitData() should not block
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5491/4/be/src/runtime/data-stream-mgr.cc
File be/src/runtime/data-stream-mgr.cc:

Line 113: return Status(TErrorCode::DATASTREAM_RECEIVER_EAGAIN, 
PrintId(fragment_instance_id));
did you consider this the case: ReachLimit() then Recvr deregister . then some 
sender first rpc coming .


-- 
To view, visit http://gerrit.cloudera.org:8080/5491
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I47f9d6c1de95a72ab73471dcd2a3118ef13e7db0
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil 
Gerrit-Reviewer: Anonymous Coward #168
Gerrit-Reviewer: Henry Robinson 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5511: Add process start time to debug web page

2017-07-06 Thread Anonymous Coward (Code Review)
gaborkas...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7363

Change subject: IMPALA-5511: Add process start time to debug web page
..

IMPALA-5511: Add process start time to debug web page

Read the start date and time of the impalad, catalogd and statestored processes
for the Debug Web UI. Uses the stat command on the /proc/ directory and
format the date with the date command to local time format.

Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4
---
M be/src/util/proc-info-test.cc
M be/src/util/process-state-info.cc
M be/src/util/process-state-info.h
3 files changed, 27 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/7363/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7363
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I05ae2f80835b1b0e4bc3b38731778ba0871338a4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: gaborkas...@cloudera.com


[Impala-ASF-CR] IMPALA-2782: Allow impala-shell to connect directly to impalad when configured with load balancer and kerberos.

2017-07-21 Thread Anonymous Coward (Code Review)
a...@phdata.io has posted comments on this change.

Change subject: IMPALA-2782: Allow impala-shell to connect directly to impalad 
when configured with load balancer and kerberos.
..


Patch Set 2:

I think we should use impalad[1] for the port still since then you'll still get 
the default 21000 if you specify nothing and its less you would have to enter 
for the -b flag. I also question whether this should be called something like 
kerberos_host_name so its more inline with kerberos_service_name change. 

host, port = self.lb.encode('ascii', 'ignore'), int(self.impalad[1])

-- 
To view, visit http://gerrit.cloudera.org:8080/7241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4726226a7a3817421b133f74dd4f4cf8c52135f9
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Vincent Tran 
Gerrit-Reviewer: Vincent Tran 
Gerrit-Reviewer: a...@phdata.io
Gerrit-HasComments: No


[Impala-ASF-CR] CDH-56987: Update sentry version to 2.0.0-cdh6.x-SNAPSHOT in impala

2017-07-25 Thread Anonymous Coward (Code Review)
kkal...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7507

Change subject: CDH-56987: Update sentry version to 2.0.0-cdh6.x-SNAPSHOT in 
impala
..

CDH-56987: Update sentry version to 2.0.0-cdh6.x-SNAPSHOT in impala

Conflicts:
bin/impala-config-branch.sh

Change-Id: I666809a37cdaca28dc754c9b2ff7830b3ff475b7
---
M bin/impala-config-branch.sh
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/07/7507/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7507
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I666809a37cdaca28dc754c9b2ff7830b3ff475b7
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: kkal...@cloudera.com


[Impala-ASF-CR] CDH-56988: Update the sentry dependencies to reflect the new version of sentry

2017-07-25 Thread Anonymous Coward (Code Review)
kkal...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7508

Change subject: CDH-56988: Update the sentry dependencies to reflect the new 
version of sentry
..

CDH-56988: Update the sentry dependencies to reflect the new version of sentry

Change-Id: Ib201b5cb0aed10fd0449c2075b8022a752525cad
---
M fe/pom.xml
1 file changed, 2 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/08/7508/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7508
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib201b5cb0aed10fd0449c2075b8022a752525cad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: kkal...@cloudera.com


[Impala-ASF-CR] CDH-56990: Fix impala code as sentry-policy-db package is removed in sentry 2.0.0-cdh6.x-SNAPSHOT

2017-07-25 Thread Anonymous Coward (Code Review)
kkal...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7509

Change subject: CDH-56990: Fix impala code as sentry-policy-db package is 
removed in sentry 2.0.0-cdh6.x-SNAPSHOT
..

CDH-56990: Fix impala code as sentry-policy-db package is removed in sentry 
2.0.0-cdh6.x-SNAPSHOT

Change-Id: If0807832e410731b16b63d17beffbc56f0c6584d
---
M fe/src/main/java/org/apache/impala/authorization/AuthorizationChecker.java
M fe/src/main/java/org/apache/impala/util/SentryPolicyService.java
2 files changed, 5 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/09/7509/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7509
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: If0807832e410731b16b63d17beffbc56f0c6584d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: kkal...@cloudera.com


[Impala-ASF-CR] CDH-56991: Implement new API's in AuthorizationPolicy

2017-07-25 Thread Anonymous Coward (Code Review)
kkal...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7510

Change subject: CDH-56991: Implement new API's in AuthorizationPolicy
..

CDH-56991: Implement new API's in AuthorizationPolicy

Change-Id: Idce534216efc48ca0b9374ff358de95ff8e173d6
---
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
1 file changed, 22 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/7510/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idce534216efc48ca0b9374ff358de95ff8e173d6
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: kkal...@cloudera.com


[Impala-ASF-CR] Problem: Improve error message when subquery is used in the ON clause

2017-08-04 Thread Anonymous Coward (Code Review)
psi...@cloudera.com has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7588

Change subject: Problem: Improve error message when subquery is used in the ON 
clause
..

Problem: Improve error message when subquery is used in the ON clause

Change-Id: I0d1dc47987de7ea04402e1ead31d81cddf2f96f2
Fix: Print the error stating that "Suquery not allowed in the ON clause"
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
1 file changed, 4 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/7588/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I0d1dc47987de7ea04402e1ead31d81cddf2f96f2
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: psi...@cloudera.com


[Impala-ASF-CR] IMPALA-4252: Min-max runtime filters for Kudu

2017-10-16 Thread Anonymous Coward (Code Review)
Anonymous Coward #345 has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7793 )

Change subject: IMPALA-4252: Min-max runtime filters for Kudu
..


Patch Set 6:

(1 comment)

link error while not include this inline head file.

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/7793/6/be/src/util/min-max-filter.cc@24
PS6, Line 24: "
#include "runtime/timestamp-value.inline.h"



--
To view, visit http://gerrit.cloudera.org:8080/7793
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I02bad890f5b5f78388a3041bf38f89369b5e2f1c
Gerrit-Change-Number: 7793
Gerrit-PatchSet: 6
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Anonymous Coward #345
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 16 Oct 2017 07:37:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5362 : Preserve case-sensitivity in field titles

2017-11-15 Thread Anonymous Coward (Code Review)
ydjainopensou...@gmail.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8544


Change subject: IMPALA-5362 : Preserve case-sensitivity in field titles
..

IMPALA-5362 : Preserve case-sensitivity in field titles

To preserve case sensitivity in column labels , this patch modifies the 
getColumnLabel()
method of the SelectListitem class to return original string. However since 
internally
all identifiers are represented in lowercase a few minor changes have been made 
to
SelectStmt.java.lowercase "null" was causing union test to fail hence replaced 
it with "NULL"

Change-Id: Ia574f0b7d5bdbebace270ce4079632bf29b3f00e
---
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
3 files changed, 16 insertions(+), 16 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/8544/1
--
To view, visit http://gerrit.cloudera.org:8080/8544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia574f0b7d5bdbebace270ce4079632bf29b3f00e
Gerrit-Change-Number: 8544
Gerrit-PatchSet: 1
Gerrit-Owner: ydjainopensou...@gmail.com


[Impala-ASF-CR] IMPALA-5362 : Preserve case-sensitivity in field titles

2017-11-15 Thread Anonymous Coward (Code Review)
ydjainopensou...@gmail.com has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/8555


Change subject: IMPALA-5362 : Preserve case-sensitivity in field titles
..

IMPALA-5362 : Preserve case-sensitivity in field titles

Fixed labeling.

Change-Id: I146b8d72e7a8f4b6d181dbd0bfe450c7451e36e4
---
M fe/src/main/java/org/apache/impala/analysis/SelectListItem.java
M tests/shell/test_shell_interactive.py
2 files changed, 10 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/55/8555/1
--
To view, visit http://gerrit.cloudera.org:8080/8555
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I146b8d72e7a8f4b6d181dbd0bfe450c7451e36e4
Gerrit-Change-Number: 8555
Gerrit-PatchSet: 1
Gerrit-Owner: ydjainopensou...@gmail.com