[Impala-ASF-CR] IMPALA-3710: All-null columns give wrong estimates in planner Modified the planner to handle low-value NDVs by adjusting them upward by one to account for null values. Thus, even an al

2018-09-27 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/11528 )

Change subject: IMPALA-3710: All-null columns give wrong estimates in planner 
Modified the planner to handle low-value NDVs by adjusting them upward by one 
to account for null values. Thus, even an all-null column, which has an NDV of 
0 in stats, will have an NDV of 1 in
..

IMPALA-3710: All-null columns give wrong estimates in planner
Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 448 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/2
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-3710: All-null columns give wrong estimates in planner Modified the planner to handle low-value NDVs by adjusting them upward by one to account for null values. Thus, even an al

2018-09-27 Thread Paul Rogers (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#3).

Change subject: IMPALA-3710: All-null columns give wrong estimates in planner 
Modified the planner to handle low-value NDVs by adjusting them upward by one 
to account for null values. Thus, even an all-null column, which has an NDV of 
0 in stats, will have an NDV of 1 in
..

IMPALA-3710: All-null columns give wrong estimates in planner
Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 448 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/3
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-27 Thread Paul Rogers (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#4).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 448 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/4
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-27 Thread Paul Rogers (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#5).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 449 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/5
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-27 Thread Paul Rogers (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#6).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
12 files changed, 450 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/6
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 6
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-10-12 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#11).

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..

IMPALA-7310: Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Bug: IMPALA-7310: All-null columns give wrong estimates in planner
Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 452 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/11
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 11
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-10-12 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11528 )

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..


Patch Set 12:

(3 comments)

> Uploaded patch set 12.

Addressed a few more review comments. Rebased on master.

http://gerrit.cloudera.org:8080/#/c/11528/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11528/10//COMMIT_MSG@71
PS10, Line 71: Bug: IMPALA-7310: All-null columns give wrong estimates in 
planner
> We almost always put "IMPALA-7310:" in the first line of the commit message
Fixed.


http://gerrit.cloudera.org:8080/#/c/11528/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/11528/10/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@110
PS10, Line 110: // not zero.
> Perhaps: "if we don't know the null count (in which case, getNumNulls() ret
Done


http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test:

http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test@a986
PS8, Line 986:
> test
Is this comment itself a test, or is this a request to test something?



--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 12
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Fri, 12 Oct 2018 23:37:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-10-12 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#12).

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..

IMPALA-7310: Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Bug: IMPALA-7310: All-null columns give wrong estimates in planner
Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 457 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/12
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 12
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-15 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11688


Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..

IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

Used existing method to remove cached per-partition column information
as requested in the JIRA ticket. See IMPALA-7501 for details.

Tested by inspecting a heap dump before and after the change to ensure
that the column schema is no longer referenced from the Hive Partition
objects. (At present, we have no automated way to inspect a heap dump.)

Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
---
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
2 files changed, 7 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/1
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7715: [DOCS] Better descriptions for conditional functions

2018-10-17 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11704 )

Change subject: IMPALA-7715: [DOCS] Better descriptions for conditional 
functions
..


Patch Set 3: Code-Review+1

Looks good. The changes wonderfully address the ambiguity in the original docs. 
Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/11704
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5cc90d62645730d2674bcb3af614863aa92b92f6
Gerrit-Change-Number: 11704
Gerrit-PatchSet: 3
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 17 Oct 2018 18:41:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7715: [DOCS] Better descriptions for conditional functions

2018-10-16 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11704 )

Change subject: IMPALA-7715: [DOCS] Better descriptions for conditional 
functions
..


Patch Set 2:

(4 comments)

Thanks for making these changes so quickly! A few minor suggestions.

http://gerrit.cloudera.org:8080/#/c/11704/2/docs/topics/impala_conditional_functions.xml
File docs/topics/impala_conditional_functions.xml:

http://gerrit.cloudera.org:8080/#/c/11704/2/docs/topics/impala_conditional_functions.xml@341
PS2, Line 341:   FALSE if the expression is true or 
NULL.
Should the other refs to boolean constants be called out? "is false" --> "is 
FALSE"? Also for true?


http://gerrit.cloudera.org:8080/#/c/11704/2/docs/topics/impala_conditional_functions.xml@345
PS2, Line 345:   
After reading your other change, it appears we support IS FALSE. So, we might 
add: "Same as the IS FALSE operator."


http://gerrit.cloudera.org:8080/#/c/11704/2/docs/topics/impala_conditional_functions.xml@445
PS2, Line 445:   FALSE if the expression is false or 
NULL.
"an expression" --> "the expression". Code font for true and false.


http://gerrit.cloudera.org:8080/#/c/11704/2/docs/topics/impala_conditional_functions.xml@449
PS2, Line 449:   
Add comment "Is same as the IS TRUE operator."

Maybe also do this for "isnottrue() - IS NOT TRUE" and "isnotfalse() - IS NOT 
FALSE"



--
To view, visit http://gerrit.cloudera.org:8080/11704
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5cc90d62645730d2674bcb3af614863aa92b92f6
Gerrit-Change-Number: 11704
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 17 Oct 2018 01:28:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5004: Switch to sorting node for large TopN queries

2018-10-18 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11698 )

Change subject: IMPALA-5004: Switch to sorting node for large TopN queries
..


Patch Set 3:

(1 comment)

This will be a great change. See comment about using a memory-based decision 
rather than, say, a row-based decision or a decision informed by the memory 
available to the query.

http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/11698/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@313
PS3, Line 313:   estimatedTopNMaterializedSize < 
ctx_.getQueryOptions().topn_bytes_limit;
This code makes the sort/top-n decision based on memory. The number of bytes is 
a parameter. This seems awkward: the memory used must be part of the overall 
query budget. A query could, in odd cases, have multiple TopN operators, but 
the code here treats them one by one.

Further, a TopN will always use fewer bytes than a Sort: a TopN needs to keep 
only n rows in general, but sort must buffer all rows. (Though, of course, sort 
can spill to relieve memory pressure.)

I wonder, does it make sense to impose the limit as a row limit rather than a 
memory limit?

Or, does it make sense to set the limit as part of a query memory planning 
exercise rather than as another free parameter that the user must juggle when 
thinking about memory budgets?



--
To view, visit http://gerrit.cloudera.org:8080/11698
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Gerrit-Change-Number: 11698
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 18 Oct 2018 21:15:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7659: Simplify expression to collect NULLs count

2018-10-18 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11565 )

Change subject: IMPALA-7659: Simplify expression to collect NULLs count
..


Patch Set 5:

(2 comments)

A few comments.

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11565/5//COMMIT_MSG@10
PS5, Line 10: IMPALA-7655.
This fix would be helpful for IMPALA-7310: if NDV=0, we need to know if 
nullCount > 0 to know if we should adjust NDV upward by one. (The planner 
includes nulls in NDV, statistics don't.)


http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/11565/5/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@251
PS5, Line 251:   " IS NULL THEN 1 ELSE NULL END)");
If this expression is faster than an if() function (IMPALA-7655), then does it 
makes sense to optimize the if() function in the planner so it benefits all 
queries rather than just changing this one?



--
To view, visit http://gerrit.cloudera.org:8080/11565
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic68f8b4c3756eb1980ce299a602a7d56db1e507a
Gerrit-Change-Number: 11565
Gerrit-PatchSet: 5
Gerrit-Owner: Anonymous Coward 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:25:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6661 Make NaN values equal for grouping purposes.

2018-10-18 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11535 )

Change subject: IMPALA-6661 Make NaN values equal for grouping purposes.
..


Patch Set 16:

(5 comments)

Good stuff. A few random comments.

http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/codegen/codegen-anyval.cc
File be/src/codegen/codegen-anyval.cc:

http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/codegen/codegen-anyval.cc@469
PS16, Line 469: void CodegenAnyVal::ConvertToCanonicalForm() {
This code handles the codegen case. Is there an interpreted code path change 
that is needed for when codegen is disabled?


http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/codegen/codegen-anyval.cc@728
PS16, Line 728: llvm::Value* CodegenAnyVal::EqToNativePtr(llvm::Value* 
native_ptr,
Maybe add a comment to define inclusive equality? The description in the commit 
message is good, maybe just add it here for future reference.


http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/codegen/codegen-anyval.cc@751
PS16, Line 751:   // Mirror logic in HashTableCtx::Equals - IMPALA-6661
Maybe explain why we're mirroring the logic? So that two NaN compare as equal? 
Or, so that they have the same hash code? Both?


http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/exec/hash-table.cc
File be/src/exec/hash-table.cc:

http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/exec/hash-table.cc@238
PS16, Line 238: bool HashTableCtx::Equals(const TupleRow* build_row, const 
uint8_t* expr_values,
Per earlier comment, maybe a comment here about what "INCLUSIVE" means? 
(Parallel handling of NULL and NaN.)


http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3074
PS16, Line 3074:  QUERY
A quick check of this file found no cases where we test base NaN functionality. 
According to Wikipedia(https://en.wikipedia.org/wiki/NaN), NaN has specific 
rules. Here we test IS DISTINCT, but should we add base tests for other 
operators? Having such tests will catch any regressions in the future if 
someone accidentally turns on the new mode at the wrong time.



--
To view, visit http://gerrit.cloudera.org:8080/11535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Gerrit-Change-Number: 11535
Gerrit-PatchSet: 16
Gerrit-Owner: Michal Ostrowski 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Michal Ostrowski 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 18 Oct 2018 22:11:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6661 Make NaN values equal for grouping purposes.

2018-10-22 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11535 )

Change subject: IMPALA-6661 Make NaN values equal for grouping purposes.
..


Patch Set 16:

(3 comments)

A few more comments as I learn this code.

http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11535/16//COMMIT_MSG@17
PS16, Line 17:   additional logic to consider NaN values as equal.
The comment mentions equality. Is the rule that NaN = NaN is always false? But 
that Nan <=> NaN is true?

Looks like the IS DISTINCT (<=>) code is implemented in operators-ir.cc, but 
that file is not in this patch set. Does the implementation of IS DISTINCT need 
to change?


http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc
File be/src/runtime/raw-value.cc:

http://gerrit.cloudera.org:8080/#/c/11535/16/be/src/runtime/raw-value.cc@33
PS16, Line 33: const float RawValue::CANONICAL_FLOAT_NAN = nanf("");
NaN is not a single bit pattern (which is why you've defined a canonical form), 
rather it is a family of bit patterns. Do the above functions guarantee to 
return the exact same bit pattern on all OSs, all versions, and all runs?

Or, should we use the LLVM trick to specify the exact bit pattern we want to 
use for NAN so that it is guaranteed to be the same on all hosts in the 
cluster, even if they use slightly different CPUs or library versions?


http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test
File testdata/workloads/functional-query/queries/QueryTest/exprs.test:

http://gerrit.cloudera.org:8080/#/c/11535/16/testdata/workloads/functional-query/queries/QueryTest/exprs.test@3074
PS16, Line 3074:  QUERY
> I've assumed that NaN tests have existed in other places.  Exhaustive valid
The thought is that we might want to ensure all paths work to handle the NaN 
case, not just the grouping case.

For example, we we have tests for all three (I believe) places that this can 
occur:

* SELECT clause: SELECT sqrt(-1) op floatCol ...
* WHERE clause: WHERE sqrt(-1) op floatCol ...
* JOIN: what is tested here.

Along with two operators

* = (NaN's are not equal)
* <=> (NaN's are equal)

We also have both the code gen and interpreted cases:

* SELECT: never code generated (when in the root fragment)
* WHERE, JOIN: code generated if enough rows, interpreted otherwise

The thought is that we might want to cover all these cases just to save the 
hassle of fixing any gaps later.



--
To view, visit http://gerrit.cloudera.org:8080/11535
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Gerrit-Change-Number: 11535
Gerrit-PatchSet: 16
Gerrit-Owner: Michal Ostrowski 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Michal Ostrowski 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 22 Oct 2018 18:04:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-24 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@89
PS2, Line 89:* code gen for , avoiding the need for ad-hoc 
implementations
:* of each function.
> Removing optimizations worked here.
Further testing revealed complexities. First, the rewrite engine does not apply 
the same rule multiple times (rewrite function, then rewrite resulting case.)

If the code is changed to apply CASE rewrite to the result, additional problems 
result. One of which is that the CASE rewrites will revert to the original 
expression if all aggregates are removed. But, we can't do that here: we no 
longer have function implementations. So, we would need the aggregate fallback 
to fallback to the CASE rewritten from the function.

The resulting code gets pretty complex and looks hard to maintain. So, I'm 
thinking to revert to the code in the first patch, with the added subtlety of 
handling aggregates in the function rewrites.

Stay tuned for a revised version.



--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Oct 2018 16:28:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7740: [DOCS] Correct description of NVL2 function

2018-10-23 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11758 )

Change subject: IMPALA-7740: [DOCS] Correct description of NVL2 function
..


Patch Set 1: Code-Review+1

Thanks for the quick fix!


--
To view, visit http://gerrit.cloudera.org:8080/11758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f6b9d56e85f7dffeb29218b244af1cc535dc03e
Gerrit-Change-Number: 11758
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 23 Oct 2018 21:01:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-23 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the code-generated
CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The functions ifnull and nvl are aliases for isnull, so revised those to
directly rewrite them to CASE rather than first converting them to if().

Moved rewriting into our existing set of conditional rewrite rules
instead of in the parser. To make this work (and to solve the issue in
IMPALA-7741, added ifnull and nvl to the list of builtin Impala
functions.

Found a few additional simplifications for coalesce, ifnull and nvl. See
the Javadoc and unit tests for details.

Tests for conditional functions were in one huge function along with
other rewrite tests. Moved them into a new file, then broke up the tests
by function to allow much easier debugging of each function one-by-one.
This required moving the common test mechanims into a new common base
class.

Once tests verified that the rewrites worked, we no longer have need of
the specialized C++ implementations, so removed those.

Also cleaned up a few typos, unnecessary imports and JavaDocs found
during this work.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java
13 files changed, 586 insertions(+), 485 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/2
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#7).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-leel conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
16 files changed, 837 insertions(+), 309 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/7
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#5).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-leel conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
16 files changed, 831 insertions(+), 309 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/5
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 5:

Thanks everyone for your patient reviews of this change. Per PhilZ, removed the 
per-function optimizations, relying on existing rewrite rules (bugs and all). 
Added tests to verify that the "downstream" rules work (when they do.)

Per Vuk, reduced the scope to only rewrite the requested functions, and to 
retain the BE implementation; living with the bugs that prevent the rewrites 
from being done some of the time.

Added more tests, including tests of the full rewrite process to ensure that 
the conditional functions are, in fact, first rewritten to use CASE, then CASE 
is further simplified. Tests include commented-out cases to record where we 
decided to live with existing bugs.

The result is the minimal change that accomplishes the request in the JIRA 
without breaking anything.


--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 29 Oct 2018 20:51:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#6).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-leel conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
16 files changed, 836 insertions(+), 309 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/6
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#8).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-level conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
17 files changed, 870 insertions(+), 313 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/8
--
To view, visit http://gerrit.cloudera.org:8080/11760

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11822 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Abandoned

Duplicate
--
To view, visit http://gerrit.cloudera.org:8080/11822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Idfe275032dc13aee64bfd60d448138b5b77ab96a
Gerrit-Change-Number: 11822
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 7:

(14 comments)

Thanks, Phil, for the code review. Addressed all but one comment.

http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@29
PS7, Line 29: As a result, the BE retains the original interpreted forms that
> Let's mention this in the backend code in comments, so that folks know it's
Done


http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@30
PS7, Line 30: are still used in two cases: 1) top-leel conditions in the
> nit: level
Done


http://gerrit.cloudera.org:8080/#/c/11760/7//COMMIT_MSG@45
PS7, Line 45: Still, the fix provides most of what the JIRA ticket requested
> Does the specific "compute stats" query that's underlying the JIRA get fast
Haven't gotten that far yet; still trying to make sure that things actually 
work. Will provide an update once I can do the required performance testing.


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
File fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@54
PS7, Line 54:
> nit: it's weird that there are two spaces there.
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@66
PS7, Line 66: // then want to allow CASE to do any simplification 
(reverting to
> nit: The parenthetical here isn't closed.
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@70
PS7, Line 70:   Expr revised =  rewriteConditionalFn((FunctionCallExpr) 
expr);
> nit: should only have one space after =
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@72
PS7, Line 72:   // Workaround for IMALA-TBD
> Is TBD resolved at this point?
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@132
PS7, Line 132:* IFNULL(a, x) --> 
> If we're not using HTML, is the  here noise?
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@179
PS7, Line 179:   if (childExpr.isAggregate()) { sawAgg = true; }
> The javadoc for "expr.contains()" suggests that hasAgg has already checked
True.

This check adjusts the non-null literal case. Ignoring aggregates, we can stop 
adding arguments (ignoring all remaining arguments) if we hit a non-null 
literal. (This is the ! hasAgg case.)

Or, if we have aggregates, we can stop if we've already added at least one 
aggregate. Otherwise, we have to keep going to add at least one aggregate -- 
even though the aggregate will never actually be evaluated. (This is the sawAgg 
case.)

But, that I had to explain this means that using dual flags is the wrong 
approach. Rewrote to use a state machine. Please let me know if that is clearer.


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@50
PS7, Line 50:  * We can't eliminate aggregates as this may change the meaning 
of the
> This is the same as line 70-72. Is that intentional?
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
File fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java:

http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@42
PS7, Line 42: System.out.println(stmt);
> remove
Done


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@97
PS7, Line 97:   @Test
:   public void testSqlRecovery() {
: ParseNode node = analyzeExpr("true and false");
: System.out.println(node.toSql());
: System.out.println(((SelectStmt) node).toSql(true));
: // SELECT TRUE AND FALSE FROM functional.alltypessmall
:   }
> There's no assertion here...
Ad-hoc test. Removed.


http://gerrit.cloudera.org:8080/#/c/11760/7/fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java@117
PS7, Line 117: // TODO: IMPALA-7766
> nit: including the subject of the JIRA here may make it obvious why...
Done



[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-29 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11822


Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-level conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163

Fixes

Change-Id: Idfe275032dc13aee64bfd60d448138b5b77ab96a
---
M be/src/exprs/conditional-functions.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
17 files changed, 870 insertions(+), 313 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/22/11822/1
--
To view, visit http://gerrit.cloudera.org:8080/11822
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-31 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 10:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@43
PS8, Line 43: (IMPALA-7737)
> are there examples of these fns in our benchmarks to quantify the regressio
Done


http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@45
PS8, Line 45: Still, the fix provides most of what the JIRA ticket requested
> I'd skip these next three paragraphs.
Done


http://gerrit.cloudera.org:8080/#/c/11760/8//COMMIT_MSG@57
PS8, Line 57:
> pls make a section for this called "Testing" so its easier to jump to. also
Done


http://gerrit.cloudera.org:8080/#/c/11760/8/be/src/exprs/conditional-functions.h
File be/src/exprs/conditional-functions.h:

http://gerrit.cloudera.org:8080/#/c/11760/8/be/src/exprs/conditional-functions.h@76
PS8, Line 76: since
: /// various bugs mean that this implementation is still sometimes 
used. But
: /// the goal is to remove these classes at some point.
> simpler: "until their use is eliminated by the frontend".
Done


http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
File fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@37
PS8, Line 37: vanish
> is this accurate given the comments in the commit message about order by?
Yes, they vanish after the rewrite. If the rewrite does not occur (due to bugs 
or options), then the functions do not vanish.


http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@48
PS8, Line 48: planner runs the rule to simplify CASE
:  * after this rule. Where that other rule can perform 
simplifications,
:  * those simplifications are omitted here
> simplify and use the specific rule name for concreteness.
Rewrote. Since many rules apply, did not seem to add value to try to enumerate 
them.


http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@106
PS8, Line 106:return rewriteCoalesceFn(e
> clarify whether you think this happens after the rewrite or before. If its
Done


http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@130
PS8, Line 130:Lists.ne
> isn't this all that's done here (the most general case) and we'll depend on
Rewrote comment


http://gerrit.cloudera.org:8080/#/c/11760/8/fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java@172
PS8, Line 172: ESCE(a, literal) when literal
> The simplest rewrite here would be to not look at the child exprs for the v
I found that I could strip this function down as you suggest, but the results 
were sub-optimal due to weaknesses in other parts of the code.

coalesce(id) --> CASE ELSE id END (expected: id)
coalesce(id, null) --> CASE WHEN id IS NOT NULL THEN ID ELSE NULL END 
(expected: id)
coalesce(null, id) --> same as above
coalesce(null is null, id) --> 1 (expected: TRUE)
coalesce(10 + null, id) --> CASE WHEN NULL IS NOT NULL THEN NULL ELSE id END 
(expected: id)
coalesce(42, min(distinct tinyint_col)) --> CASE WHEN FALSE THEN NULL WHEN TRUE 
THEN 42 ELSE min(distinct tinyint_col, 42)) END (Expected: CASE WHEN 
min(distinct tinyint_col) IS NOT NULL THEN min(distinct tinyint_col) ELSE 42 
END)

And so on.

One solution is to restore the original coalesce() function rewrite which 
produces somewhat better results. Even that function has holes which this 
version fixed. So, we'd lose those optimizations.

As it turns out, all this is moot because the BE CASE implementation has a bug 
so we can't even use CASE to replace coalesce()...

For now, I did rework the function to the simple form requested, then disabled 
it. We can tackle any other concerns if/when it is a priority.


http://gerrit.cloudera.org:8080/#/c/11760/8/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test:

http://gerrit.cloudera.org:8080/#/c/11760/8/testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test@1137
PS8, Line 1137: |  other predicates: functional.alltypestiny.tinyint_col + 
functional.alltypestiny.smallint_col + functional.alltypestiny.int_col > 10, 
CASE WHEN functional.alltypestiny.tinyint_col + 
functional.alltypestiny.bigint_col IS NULL THEN 1 ELSE 
functional.alltypestiny.tinyint_col + functional.alltypestiny.bigint_col END = 1
> so this is the example of the performance regression (same 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull to use CASE

2018-10-31 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#11).

Change subject: IMPALA-7655: Rewrite if, isnull to use CASE
..

IMPALA-7655: Rewrite if, isnull to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the if and isnull functions into the equivalent
CASE structure. Coalesce is omitted due to limitations within the
code.

Caveats:

* IMPALA-7753: Conditionals in the top-level ORDER BY clause are not
rewritten, but those one or more levels down are. The result is that
PlannerTest files still show uses of if and isnull despite this fix.
* IMPALA-7785: Similar, but says that the GROUP BY clause is not
analyzed prior to rewrites, so the code again forces analysis.
There are some odd "analyze()" calls in the code to work around this
limitation snd the next two.
* IMPALA-7769: Some expressions involving NULL are not simplified.
As a result, some of the rewrites don't result in the most optimal
result.
* IMPALA-7754: The rewrite engine ignores unanalyzed expressions, yet
the rewrite engine does not, in general, re-analyze the expressions it
produces, causing simplifications to be skipped.
* IMPALA-4356: Even after this change, code gen won't occur for SELECT
expressions executed in teh root fragment (the most common case
in simple tests.)

Because of the above, some if and isnull calls are not rewritten, and
others are not rewritten to the simplest forms. The tests contain
commented-out tests, and comments with bug numbers, to record these
known issues.  Still, the current fix is useful enough by itself to
proceed despite the above.

While the desire was to replace the BE implementation of if and isnull,
the caveats prevent doing so at present, so the BE retains the original
interpreted forms. They are used when: 1) top-level conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Coalesce() is implemented, but disabled. The old rewrites are retained
until blocking issues are fixed.

Testing:

* Tests for conditional functions are pulled out into a new test class.
* A new base class holds code common to the new class and the existing
ExprRewriterTest.
* A new FullRewriteTest class tests rewrites that require multiple
passes thorugh teh rewrite engine.
* Planner Test expected results were updated based on the rewritten
expressions.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
13 files changed, 939 insertions(+), 256 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/11
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 11
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-30 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#9).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-level conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
18 files changed, 946 insertions(+), 313 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-30 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 7:

Turns out that there is a BE bug that means CASE is not equivalent to 
coalesce(). Disabled all coalesce() rewrites and tests, restoring original 
behavior from master. Coalesce() can be revisited when IMPALA-7793 is fixed.


--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 31 Oct 2018 02:24:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-30 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Philip Zeyliger, Tim Armstrong, Csaba Ringhofer, 
Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#10).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the
code-generated CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The rewrite engine has many bugs that are beyond the scope of
this change to fix. This change codes around those bugs. The
result is that the conditional rewrite happens some of the time,
and sometimes produces less-than perfect optimizations.

Conditionals in the top-level ORDER BY clause are not rewritten
(IMPALA-7753), but those one or more levels down are. Some expressions
involving NULL are not simplified (IMPALA-7769). Several hacks were
used to work around the fact that the rewrite engine ignores unanalyzed
expressions, yet the rewrite engine does not, in general, re-analyze
the expressions it produces, causing simplifications to be skipped
(IMPALA-7754). And so on.

As a result, the BE retains the original interpreted forms that
are still used in two cases: 1) top-level conditions in the
ORDER BY clause, and 2) if the user disables rewrites.

Further, code generation does not occur for CASE statements in the
SELECT clause when it is in the root fragment (the most common case
in simple tests.) This is another known bug (IMPALA-4356).

One possible performance regression is that the new form of the code
evaluates some expressions twice, where the original interpreted
code evaluated the argument once. E.g. coalesce(id, 10) is rewritten
to CASE WHEN id IS NULL THEN id ELSE 10 END. Here, id is evaluated
twice. If the "id" were replaced by a complex sub-expression, the
gain from compilation could be offset by doing work twice.
(IMPALA-7737)

Still, the fix provides most of what the JIRA ticket requested
within the limitations of the existing code.

Conditional function rewrites are moved into a new class,
RewriteConditionalsRule in order to keep things simple.

Most functions use the simplest possible rewrite, relying on the
existing rewrite rules for further simplification.  The one exception
is coalesce(): the existing code relies on the semantics of the
function and so was retained and slightly improved. The code was
extended to produce a CASE statement directly, retaining existing
simplifications.

Tests for conditional functions were in one large function along with
other rewrite tests. Moved them into a new file, then broke up the
tests by function to allow much easier debugging of each function
one-by-one.  This required moving the common test mechanims into a
new common base class.

Existing tests focus on one or two rules at a time. The conditional
function rewrite, however, relies on the entire set of rules being
applied repeatedly. So, added a new FullRewriteTest case to verify this
behavior. This class contains several commented-out tests that cannot
pass due to existing rewrite bugs noted above.

Changing the rewrite cause the PlannerTest to produce different plans
than previously. Changed the expected results file to match the new
rewrite rules.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions.h
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/RewriteConditionalFnsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/FullRewriteTest.java
A fe/src/test/java/org/apache/impala/analysis/RewriteConditionalFnsRuleTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/aggregation.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/implicit-joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M testdata/workloads/functional-planner/queries/PlannerTest/outer-joins.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
18 files changed, 1,014 insertions(+), 282 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF 

[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 4:

(12 comments)

Thanks everyone for the comments.

PhilZ's note to use existing optimizations is a good one. A number of changes 
are required to make that happen. Researching that turned up quite a few bugs 
listed in IMPALA-7655. I plan to push changes for those bugs one by one to ease 
reviews. Once those fixes are in master, I'll push a new change set here 
rebased on the revised master.

http://gerrit.cloudera.org:8080/#/c/11760/4/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/11760/4/common/function-registry/impala_functions.py@531
PS4, Line 531:   # Rewritten away in the FE
> can you explain why this is needed in this change?
See IMPALA-7741. nvl2() currently does not appear in the list of built-in 
functions. However, since this improvement is orthogonal to the topic of this 
bug, I'll drop this change from here and do a separate change set for 
IMPALA-7741.


http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@a92
PS4, Line 92:
> rewrites can be disabled via a query option so cannot be assumed to run. it
Right. Turns out that I had to alter the rule application flow to always 
rewrite the conditional functions.

The other choice is to leave the current interpreted implementations and have 
two paths based on turning on optimizations or not. In the spirit of "simpler 
is better", seems to make sense to have only one implementation, but the 
two-implementation solution is also possible.

Though, note, in the original code, nvl2() was always rewritten, there never 
was a BE implementation, so that aspect is not really changing.


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@89
PS2, Line 89:* code gen for , avoiding the need for ad-hoc 
implementations
:* of each function.
> The following code looks to me like it'll do re-writes until we're "stuck".
Turns out this is a difference between the simplified test framework (which 
applies selected rules once) and the production code (which replies rules 
repeatedly.) Easy enough to create a separate test that will apply rules in the 
production way so that we take multiple passes to verify all rewrites.

Making this change uncovered quite a few obscure bugs which are recorded in the 
JIRA and will be submitted as separate changes for review. Once those are in, 
I'll rebase this change onto that code, and implement the chained rewrite test 
and logic.


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@122
PS2, Line 122:   private Expr rewriteIfFn(FunctionCallExpr expr) {
> Is the issue that "null is null" is not getting constant optimization? In m
See earlier note.


http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@112
PS4, Line 112: 
> to be consistent with the style in the frontend Java code, pls remove these
Really? We create Javadoc comments, but don't use the resulting formatting? 
Eclipse, for example, will display the formatting in hovers. Without proper 
HTML formatting, the result is a bit of a mess.

Still, if our policy is to use Javadoc comments, but without formatting, I can 
revert this stuff.


http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@144
PS4, Line 144: instanceof NullLiteral)
> I would prefer .isNullLiteral() as other functions use that.
This is existing code that was moved. However, I agree with your point. I've 
gone though and cleaned up places that use this older style to use the newer 
predicates, including in cases where we test for true and false.


http://gerrit.cloudera.org:8080/#/c/11760/4/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@183
PS4, Line 183: revised.size() - 1;
> Can you add a comment about not adding the last child here? It took me some
Done



[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11760/4/common/function-registry/impala_functions.py
File common/function-registry/impala_functions.py:

http://gerrit.cloudera.org:8080/#/c/11760/4/common/function-registry/impala_functions.py@531
PS4, Line 531:   # Rewritten away in the FE
> See IMPALA-7741. nvl2() currently does not appear in the list of built-in f
Actually, this is a bit more subtle. Today, we rewrite nvl2() and nullif() at 
parse time. The goal is to rewrite them in the rewrite rules, along with other 
conditionals. Doing so allows us to take advantage of analyzed arguments to 
implement optimizations.

To do that, however, we need the function to be defined in this function table. 
We need it as a FE-only function (which also allows it to appear in the table 
of built-in functions) so it passes analysis, even though there is no BE 
implementation.



--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 25 Oct 2018 18:52:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Update .gitignore

2018-10-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11787 )

Change subject: Update .gitignore
..


Patch Set 3: Code-Review+1

Thanks! I've found a few more additions, many around the use of Eclipse. 
Probably easiest if I add those after you commit this set.


--
To view, visit http://gerrit.cloudera.org:8080/11787
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I281ab3b5c98ac32e5d60663562628ffda6606a6a
Gerrit-Change-Number: 11787
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 25 Oct 2018 18:56:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Patch Set 2:

(2 comments)

Updated with changes from code review comments.

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@1169
PS1, Line 1169:   // The per-partition schema information is not used by 
Impala.
> good idea, that would reduce network bandwidth noticeably considering we do
Good idea. In looking into this, the catalogd caches Impala-specific objects, 
then recreates an HMS Partition object, passing in the table schema when 
creating the StorageDescriptor(). The fix is to simply not add the schema.

When running in local catalog mode, we hit this line, with the original 
Partition object, so we still need this line.

There was discussion on some list or other about the idea of caching the Impala 
catalog objects rather than the HMS objects. The Impala objects appear to be 
stripped down. By caching HMS objects, we will find ourselves going through the 
same process that likely led to those light-weight Impala objects.


http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/11688/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@71
PS1, Line 71:
> checkArgument
Done



--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 26 Oct 2018 22:33:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-10-26 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11688

to look at the new patch set (#2).

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..

IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

Used existing method to remove cached per-partition column information
as requested in the JIRA ticket. See IMPALA-7501 for details.

Then, revised catalogd to not include the unwanted partition schema
info when re-creating an HMS parition object.

Cleaned up a few unused imports in test files used during this effort.

Tested by inspecting a heap dump before and after the change to ensure
that the column schema is no longer referenced from the Hive Partition
objects. (At present, we have no automated way to inspect a heap dump.)

Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
5 files changed, 5 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/11688/2
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog

2018-11-01 Thread Paul Rogers (Code Review)
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog
..


Abandoned

Change has become too involved for a simple ramp-up task; really requires the 
deep understanding of the metadata team. Please incorporate the specific 
changes if/where they make sense.
--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7501: Slim down metastore Partition objects in LocalCatalog cache

2018-10-25 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11688 )

Change subject: IMPALA-7501: Slim down metastore Partition objects in 
LocalCatalog cache
..


Patch Set 1:

Hey Todd, you asked for this fix. Can you take a look at this change? Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/11688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c861452cf974970ab511406fe2e175ea3dc668d
Gerrit-Change-Number: 11688
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 25 Oct 2018 23:25:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7739 IMPALA-7740: [DOCS] Correct descriptions of NVL2 and DECODE

2018-10-23 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11758 )

Change subject: IMPALA-7739 IMPALA-7740: [DOCS] Correct descriptions of NVL2 
and DECODE
..


Patch Set 4: Code-Review+1

Very nice. Great improvement!


--
To view, visit http://gerrit.cloudera.org:8080/11758
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f6b9d56e85f7dffeb29218b244af1cc535dc03e
Gerrit-Change-Number: 11758
Gerrit-PatchSet: 4
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 24 Oct 2018 00:30:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-23 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#4).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the code-generated
CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The functions ifnull and nvl are aliases for isnull, so revised those to
directly rewrite them to CASE rather than first converting them to if().

Moved rewriting into our existing set of conditional rewrite rules
into SimplifyConditionalRules instead of in the parser. To make this work
(and to solve the issue in IMPALA-7741, added ifnull and nvl to the list
of builtin Impala functions.

Found a few additional simplifications for coalesce, ifnull and nvl. See
the Javadoc and unit tests for details. (See IMPALA-7750 for why the
rules are needed; solve IMPALA-7750 and the rules can be removed as the
rewrite for CASE will handle them.)

Tests for conditional functions were in one huge function along with
other rewrite tests. Moved them into a new file, then broke up the tests
by function to allow much easier debugging of each function one-by-one.
This required moving the common test mechanims into a new common base
class.

Once tests verified that the rewrites worked, we no longer have need of
the specialized C++ implementations, so removed those.

Also cleaned up a few typos, unnecessary imports and JavaDocs found
during this work.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/scalar-expr.cc
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java
12 files changed, 617 insertions(+), 493 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/4
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-23 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11760

to look at the new patch set (#3).

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..

IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

See IMPALA-7655 for backgound. Tim found that the current interpreted
forms of if, isnull and coalesce are slow compared to the code-generated
CASE statement.

This patch rewrites the above functions into the equivalent CASE
structure.

The functions ifnull and nvl are aliases for isnull, so revised those to
directly rewrite them to CASE rather than first converting them to if().

Moved rewriting into our existing set of conditional rewrite rules
into SimplifyConditionalRules instead of in the parser. To make this work
(and to solve the issue in IMPALA-7741, added ifnull and nvl to the list
of builtin Impala functions.

Found a few additional simplifications for coalesce, ifnull and nvl. See
the Javadoc and unit tests for details. (See IMPALA-7750 for why the
rules are needed; solve IMPALA-7750 and the rules can be removed as the
rewrite for CASE will handle them.)

Tests for conditional functions were in one huge function along with
other rewrite tests. Moved them into a new file, then broke up the tests
by function to allow much easier debugging of each function one-by-one.
This required moving the common test mechanims into a new common base
class.

Once tests verified that the rewrites worked, we no longer have need of
the specialized C++ implementations, so removed those.

Also cleaned up a few typos, unnecessary imports and JavaDocs found
during this work.

Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
---
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/scalar-expr.cc
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeExprsTest.java
A fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java
12 files changed, 600 insertions(+), 493 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/11760/3
--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7655: Rewrite if, isnull, coalesce to use CASE

2018-10-23 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11760 )

Change subject: IMPALA-7655: Rewrite if, isnull, coalesce to use CASE
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11760/2/be/src/exprs/scalar-expr.h
File be/src/exprs/scalar-expr.h:

http://gerrit.cloudera.org:8080/#/c/11760/2/be/src/exprs/scalar-expr.h@84
PS2, Line 84: /// necessary on the arguments to generate the result.These 
compute functions have
> Period after space; this change seems accidental.
Done


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
File fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java:

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@89
PS2, Line 89:* IF(TRUE, then, else)  
then
:* IF(FALSE|NULL, then, else)  
else
> Do we need to do this? The general case of simplifyCaseExpr() should take c
Removing optimizations worked here.


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@122
PS2, Line 122:* Rewrites IFNULL(a, x):
> Same comment: can we just always make this a case, and then other simplific
Tried removing the optimizations. Turns out that the CASE simplification did 
not discover these optimizations. See IMPALA-7750.


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java@175
PS2, Line 175:   // TODO: Clone child expr here?
> Do you know what the answer is here?
Done


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/test/java/org/apache/impala/analysis/BaseRewriteRulesTest.java@1
PS2, Line 1: package org.apache.impala.analysis;
> Please copy over the regular license stuff here.
Done


http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java
File 
fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/11760/2/fe/src/test/java/org/apache/impala/analysis/SimplifyConditionalRulesTest.java@35
PS2, Line 35: "CASE WHEN id = 0 THEN TRUE ELSE FALSE END");
> Should this be simplified further to "id = 0"? Or do we think it doesn't ma
Subtle issue. The expression, as written, is equivalent to istrue(id = 0). By 
itself, id = 0 can be true, false or null. So, we could not eliminate the CASE 
operator, though we could rewrite it to istrue(id = 0).

Of course, new could also rewrite isTrue(x) to be CASE x THEN true ELSE FALSE 
END ...



--
To view, visit http://gerrit.cloudera.org:8080/11760
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I526654d8546e17b2545c42cc59dab66d9fe1b163
Gerrit-Change-Number: 11760
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 24 Oct 2018 01:17:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11887 )

Change subject: IMPALA-7818: Standardize use of Expr predicates
..


Patch Set 3:

Dry-run passed: https://jenkins.impala.io/job/pre-review-test/221/


--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 07 Nov 2018 17:35:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11887 )

Change subject: IMPALA-7818: Standardize use of Expr predicates
..


Patch Set 4:

(2 comments)

Fixed check style warnings.

http://gerrit.cloudera.org:8080/#/c/11887/3/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/11887/3/fe/src/main/java/org/apache/impala/analysis/Expr.java@700
PS3, Line 700:   Preconditions.checkState(Expr.IS_NULL_LITERAL.apply(this) 
||
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/11887/3/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
File fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java:

http://gerrit.cloudera.org:8080/#/c/11887/3/fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java@87
PS3, Line 87: if (Expr.IS_NULL_LITERAL.apply(literalValue) ||
> line too long (94 > 90)
Done



--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 07 Nov 2018 19:02:22 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11887

to look at the new patch set (#4).

Change subject: IMPALA-7818: Standardize use of Expr predicates
..

IMPALA-7818: Standardize use of Expr predicates

The Expr node provids two kinds of queries about node categories:
predicates and isMuble() functions. This change standardizes two
existing methods to use predicates instead.

First, the existing isLiteral() method is replaced by a new
IS_LITERAL predicate.

The key purpose is to clean up the confusing use of the existing
isNullLiteral() method, which actually is a check for either a null
literal or a cast of a null. To make this clearer, replaced this
method with a new IS_NULL_VALUE() predicate.

Added a new IS_NULL_LITERAL predicate for the case when a node must be
exactly the NULL literal, not a cast to NULL. Replaced instances of
foo instanceof NullLiteral with a use of the new IS_NULL_LITERAL
predicate. (This revealed bugs which will be addressed separately.)

Added an IS_NON_NULL_LITERAL to replace the two-method ideom used in
several places.

Tests: No functional change. Reran all tests to ensure nothing changed.

Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
29 files changed, 137 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/11887/4
--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11887


Change subject: IMPALA-7818: Standardize use of Expr predicates
..

IMPALA-7818: Standardize use of Expr predicates

The Expr node provids two kinds of queries about node categories:
predicates and isMuble() functions. This change standardizes two
existing methods to use predicates instead.

First, the existing isLiteral() method is replaced by a new
IS_LITERAL predicate.

The key purpose is to clean up the confusing use of the existing
isNullLiteral() method, which actually is a check for either a null
literal or a cast of a null. To make this clearer, replaced this
method with a new IS_NULL_VALUE() predicate.

Added a new IS_NULL_LITERAL predicate for the case when a node must be
exactly the NULL literal, not a cast to NULL. Replaced instances of
foo instanceof NullLiteral with a use of the new IS_NULL_LITERAL
predicate. (This revealed bugs which will be addressed separately.)

Added an IS_NON_NULL_LITERAL to replace the two-method ideom used in
several places.

Tests: No functional change. Reran all tests to ensure nothing changed.

Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
29 files changed, 135 insertions(+), 93 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/11887/3
--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11893 )

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..


Patch Set 1:

Dry-run tests passed: https://jenkins.impala.io/job/pre-review-test/220/


--
To view, visit http://gerrit.cloudera.org:8080/11893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
Gerrit-Change-Number: 11893
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 07 Nov 2018 17:36:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11893


Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..

IMPALA-7823: Clean up Java warnings, fix minor issues

Roll-up of trivial Java fixes: for warnings, formatting, toString()
methods. Done as a separate patch to keep others small and focused.

Testing: no functional changes. Ran full tests to check for
regressions.

Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/analysis/PlanHint.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/ShowGrantPrincipalStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeExprsRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
61 files changed, 57 insertions(+), 138 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11893/1
--
To view, visit http://gerrit.cloudera.org:8080/11893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: 

[Impala-ASF-CR] IMPALA-7808: Refactor Analyzer for easier debugging

2018-11-06 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11883 )

Change subject: IMPALA-7808: Refactor Analyzer for easier debugging
..


Patch Set 2:

(12 comments)

Thanks for the reviews. Addressed the comments.

If you are ready, I'll go ahead and shift the indent of the methods in the 
inner class, which will mark a zillion likes as changed. In its current form, 
you can more easily see the actual changes.

http://gerrit.cloudera.org:8080/#/c/11883/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11883/1//COMMIT_MSG@37
PS1, Line 37: Reran a
> pls move this testing section above the Change-Id
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@461
PS1, Line 461: r
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@461
PS1, Line 461: eturn;
> We generally do not use braces in one line "if (cond) stmt;".
Thanks. Every project has its quirks; I 'm still learning Impala's preferences.
Fixed.


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@176
PS1, Line 176: private final Analyzer analyzer_;
> these members should be suffixed with "_"
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@186
PS1, Line 186: Note tem
> note, you can always include such review-level comments as comments in gerr
Agreed. The comment is here to rather force the need to come back and shift 
indentation before the final +2. At that point, the comment will be removed.


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@219
PS1, Line 219: if (multiAggInfo_ != null && 
multiAggInfo_.hasAggregateExprs()) {
> here and at many other functions: in Impala we generally do not add an empt
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@389
PS1, Line 389: if (t
> did some other class need this protected?
Tried to minimize extra changes. It works to make this private, so went ahead 
and did so.


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@526
PS1, Line 526: c
> nit: extra space
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@617
PS1, Line 617: su
> why public?
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@659
PS1, Line 659:
> why public?
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@757
PS1, Line 757: }
> why public?
Done


http://gerrit.cloudera.org:8080/#/c/11883/1/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@894
PS1, Line 894:
> just to remind myelf... end of inner class.
Yes. When you are satisfied that nothing changed other than the inner class, 
I'll go ahead and indent the whole mess.



--
To view, visit http://gerrit.cloudera.org:8080/11883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
Gerrit-Change-Number: 11883
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 06 Nov 2018 20:44:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7808: Refactor Analyzer for easier debugging

2018-11-06 Thread Paul Rogers (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11883

to look at the new patch set (#3).

Change subject: IMPALA-7808: Refactor Analyzer for easier debugging
..

IMPALA-7808: Refactor Analyzer for easier debugging

Changes two blocks of code to make debugging easier. No functional
changes occur; changes are pure refactoring. A trivial change in
AnalyzerContext removes a nested conditional clause.

A larger change in SelectStmt takes the large analysis function and
breaks it into a series of smaller functions. The functions were large
because they shared state: variables created near the top are used much
later near the bottom.

To solve this, moved the code into an "algorithm" class whose only job
is to hold onto the temporary state so that the big function can be
broken into smaller pieces, with the temporary class fields used in
place of the former local variables.

For the most part, the existign code was simply split into functions
and indented.  One block of code had to be moved below the inner class
since it is not part of the analysis process.

Testing: No functional change, changes are purely structure.
Reran all tests, which passed.

Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 751 insertions(+), 688 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11883/3
--
To view, visit http://gerrit.cloudera.org:8080/11883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
Gerrit-Change-Number: 11883
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Hello Tim Armstrong, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11887

to look at the new patch set (#5).

Change subject: IMPALA-7818: Standardize use of Expr predicates
..

IMPALA-7818: Standardize use of Expr predicates

The Expr node provide two kinds of queries about node categories:
predicates and isMumble() functions. This change standardizes two
existing methods to use predicates instead.

First, the existing isLiteral() method is replaced by a new
IS_LITERAL predicate.

The key purpose is to clean up the confusing use of the existing
isNullLiteral() method, which actually is a check for either a null
literal or a cast of a null. To make this clearer, replaced this
method with a new IS_NULL_VALUE() predicate.

Added a new IS_NULL_LITERAL predicate for the case when a node must be
exactly the NULL literal, not a cast to NULL. Replaced instances of
foo instanceof NullLiteral with a use of the new IS_NULL_LITERAL
predicate. (This revealed bugs which will be addressed separately.)

Added an IS_NON_NULL_LITERAL to replace the two-method idiom used in
several places.

Tests: No functional change. Reran all tests to ensure nothing changed.

Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CastExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnDef.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
M fe/src/main/java/org/apache/impala/analysis/LiteralExpr.java
M fe/src/main/java/org/apache/impala/analysis/PartitionKeyValue.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSet.java
M fe/src/main/java/org/apache/impala/analysis/PartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/TupleIsNullPredicate.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
M fe/src/test/java/org/apache/impala/catalog/local/LocalCatalogTest.java
29 files changed, 138 insertions(+), 94 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/11887/5
--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7818: Standardize use of Expr predicates

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11887 )

Change subject: IMPALA-7818: Standardize use of Expr predicates
..


Patch Set 5:

(5 comments)

Thanks Vuk for the review. Addressed your comments.

http://gerrit.cloudera.org:8080/#/c/11887/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11887/4//COMMIT_MSG@10
PS4, Line 10: isMumble(
> sp?
Done


http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@1169
PS4, Line 1169: IS_NU
> drop here, and other places where its not needed.
Thanks for catching this.


http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@432
PS4, Line 432: IS_NU
> can drop
Done


http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java:

http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@132
PS4, Line 132: IS_NON_NULL_LITERAL
> double-checking if this translation preserves the previous expression? isNu
I agree, the original is confusing with "isNullLiteral()" meaning "null literal 
or cast." Let's work this through.

The first term in the original matches only literal nodes. The second term 
matched Null literal or a cast of a null. But, since the first term excluded 
the cast, we only care about the null literal check.

The new predicate allows all literals except the null literal. So, unless I'm 
missing something, they seem identical.


http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java
File fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java:

http://gerrit.cloudera.org:8080/#/c/11887/4/fe/src/main/java/org/apache/impala/rewrite/FoldConstantsRule.java@52
PS4, Line 52: IS_LITERAL.apply
> IS_LITERAL?
Yes. The change shown here is needed, but is better presented in a separate 
patch that includes new test cases.



--
To view, visit http://gerrit.cloudera.org:8080/11887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I09a089c0f2484246e5c6444b78990fa9260c036a
Gerrit-Change-Number: 11887
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Wed, 07 Nov 2018 21:17:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11893 )

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@96
PS1, Line 96:   // can be applied at the corresponding scan nodes.
> I had a chat with Bikram offline (his patch added these). He too thinks the
Thanks! Went ahead and removed them.



--
To view, visit http://gerrit.cloudera.org:8080/11893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
Gerrit-Change-Number: 11893
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 07 Nov 2018 21:20:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-07 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11893 )

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..


Patch Set 1:

(3 comments)

Thanks Bharath for the review. Addressed comments.

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
File fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java:

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java@86
PS1, Line 86: new LinkedHashMap();
> (multiple places) We can also use Maps.newLinkedHashMap() for brevity or pr
Not sure why this didn't work the first time. Now, Eclipse does allow the 
diamond form, so used that.

The Guava methods were added prior to the diamond syntax; Google suggests using 
the more modern syntax instead.


http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java
File fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java:

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java@24
PS1, Line 24: @SuppressWarnings("serial")
> Apply this in the CatalogException class too and get rid of  serialVersionU
Done


http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/11893/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@96
PS1, Line 96:   @SuppressWarnings("unused")
> Why can't we remove them? I'm not familiar enough with RuntimeFilter code.
I'm not familiar with the code either or I would have removed them. I did 
remove a few others that clearly were not doing any good.



--
To view, visit http://gerrit.cloudera.org:8080/11893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
Gerrit-Change-Number: 11893
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 07 Nov 2018 21:02:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-07 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11893

to look at the new patch set (#2).

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..

IMPALA-7823: Clean up Java warnings, fix minor issues

Roll-up of trivial Java fixes: for warnings, formatting, toString()
methods. Done as a separate patch to keep others small and focused.

Testing: no functional changes. Ran full tests to check for
regressions.

Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/analysis/PlanHint.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/ShowGrantPrincipalStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogException.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeExprsRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
62 files changed, 58 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11893/2
--
To view, visit http://gerrit.cloudera.org:8080/11893
To 

[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-11-08 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Bikramjeet Vig, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#13).

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..

IMPALA-7310: Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Bug: IMPALA-7310: All-null columns give wrong estimates in planner
Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
9 files changed, 455 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/13
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 13
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7807: Analysis test fixture

2018-11-08 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/11881 )

Change subject: IMPALA-7807: Analysis test fixture
..

IMPALA-7807: Analysis test fixture

Refactors the existing ExprRewriteRulesTest to pull out the framework
functionality into a "test fixture" class which can be used for a
variety of tests. The fixture allows more variation, and access to
more intermediate state, than does the existing function-based
framework. For example, the fixture allows setting per-query
options, allows doing full or partial (no rewrite) analysis, provides
access to things like the Analyzer, and more.

The fixture pulls in code from FrontEndBase to handle the low-level
tasks of running a query, and from ExprRewriteRulesTest for the
rewrite-specific aspects. For now, the FrontEndBase was left unchanged.
ExprRewriteRulesTest is refactored to use the new fixture in order to
illustrate how it can be used.

The key value of this work is to allow greater detail in testing in
future change requests. No new test cases were added here in order to
keep the refactoring as simple and clean as possible.

Testing: since this is a test fixture, the refactored ExprRewriteRulesTest
implicitly tests the fixture code. No "production" (non-test) code was
changed in this patch.

Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
---
A fe/src/test/java/org/apache/impala/analysis/AnalysisFixture.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
3 files changed, 480 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11881/3
--
To view, visit http://gerrit.cloudera.org:8080/11881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
Gerrit-Change-Number: 11881
Gerrit-PatchSet: 3
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7805: Emit zero as "0" in toSql()

2018-11-08 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/11878 )

Change subject: IMPALA-7805: Emit zero as "0" in toSql()
..

IMPALA-7805: Emit zero as "0" in toSql()

It turns out that Impala has a somewhat Baroque way to represent the
value of a numeric 0.  NumericLiteral.toSql() uses the Java
BigDecimal class to convert a numeric value to a string for use in
explained plans and in verifying expression rewrites.

The default Java behavior is to consider scale when rendering numbers,
including 0. Thus, depending on precision and scale, you may get:

0
0.0
0.00
0.000
...
0E-38

However, mathematically, zero is zero. Plans attach no special meaning
to the extra decimal points or trailing zeros.

To make testing easier, changed the behavior to always emit "0" when the
value is zero, regardless of precision or scale.

Testing: Reran the planner tests and modified captured plans that had
the 0.0, 0.00 variations of zero.

Since this change affects only EXPLAIN output, it cannot affect the
operation of queries. If may impact other tests that compare EXPLAIN
output to a "golden" copy.

Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1
---
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
6 files changed, 28 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/11878/2
--
To view, visit http://gerrit.cloudera.org:8080/11878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1
Gerrit-Change-Number: 11878
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-11-08 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Bikramjeet Vig, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#14).

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..

IMPALA-7310: Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Bug: IMPALA-7310: All-null columns give wrong estimates in planner
Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
11 files changed, 460 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/14
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 14
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-11-08 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Bikramjeet Vig, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#15).

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..

IMPALA-7310: Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

The change is limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Testing: Reran and adjusted PlannerTest. Added a new CardinalityTest
based on an adjustment to the planning API.

Added a new table, similar to the existing nulltable,
but which has multiple rows with stats calculated.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 458 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/15
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 15
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7842: Make query fragments available for unit testing

2018-11-13 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11920

to look at the new patch set (#2).

Change subject: IMPALA-7842: Make query fragments available for unit testing
..

IMPALA-7842: Make query fragments available for unit testing

Today, the FrontEnd class uses a functional model to generate a plan:
pass in SQL text to the createExecRequest() method and get back a Thrift
plan ready for serialization.

For unit testing, however, we need access to the intermediate plan
fragments produced by the planner. Inspecting only the Thrift version
results in loss of important details.

This fix introduces a new planner context (PlanCtx) class that passes
information into the planner, and can pass the plan fragments back out.
There is no functional change, just a refactoring of the existing code.

The changes also introduces a new test case that uses this feature to
verify plan cardinality.

Tests: The new test case demonstrates the functionality.

Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
---
M fe/src/main/java/org/apache/impala/service/Frontend.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
2 files changed, 219 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11920/2
--
To view, visit http://gerrit.cloudera.org:8080/11920
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
Gerrit-Change-Number: 11920
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7842: Make query fragments available for unit testing

2018-11-13 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11920 )

Change subject: IMPALA-7842: Make query fragments available for unit testing
..


Patch Set 2:

(2 comments)

Thanks, Phil, for the review. Addressed your comments.

http://gerrit.cloudera.org:8080/#/c/11920/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/11920/1/fe/src/main/java/org/apache/impala/service/Frontend.java@188
PS1, Line 188: protected final TQueryCtx queryCtx_;
 : protected final StringBuilder explainBuf_;
 : protected boolean capturePlan_;
 : protected List pla
> Are these intentionally public?
Made these protected, some final.


http://gerrit.cloudera.org:8080/#/c/11920/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1227
PS1, Line 1227:   StringBuilder explainString)
> This should be on line 1220?
There are actually three public entry points. Retained the older two to avoid 
changing more code than necessary. Can clean that up later.



--
To view, visit http://gerrit.cloudera.org:8080/11920
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
Gerrit-Change-Number: 11920
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 13 Nov 2018 21:41:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7841: Refactor QueryStmt, etc. for easier debugging

2018-11-13 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11915 )

Change subject: IMPALA-7841: Refactor QueryStmt, etc. for easier debugging
..


Patch Set 1:

Dry-run tests passed: https://jenkins.impala.io/job/pre-review-test/224/


--
To view, visit http://gerrit.cloudera.org:8080/11915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
Gerrit-Change-Number: 11915
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Tue, 13 Nov 2018 21:44:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging

2018-11-15 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11890 )

Change subject: IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging
..


Patch Set 2:

(4 comments)

Thanks for the reviews. Addressed the comments, including removing the new (but 
disabled) tests. Will include those later along with the fixes that they 
motivated.

http://gerrit.cloudera.org:8080/#/c/11890/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11890/1//COMMIT_MSG@10
PS1, Line 10: Th
> ?
Done


http://gerrit.cloudera.org:8080/#/c/11890/1//COMMIT_MSG@26
PS1, Line 26: Testing: this is a test, ran the test to ensure it still passes.
:
> Should we add these tests separately along with the jira fixes? Not sure wh
Sure.


http://gerrit.cloudera.org:8080/#/c/11890/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

http://gerrit.cloudera.org:8080/#/c/11890/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@438
PS1, Line 438: l FALSE, return implicit NULL ELSE.
 : RewritesOk("decode(1, 1 + 1, id, 1 + 2, 3)", rules, "NULL");
 : // Multiple TRUE, first one becomes ELSE.
> nit: format to fewer lines
Done


http://gerrit.cloudera.org:8080/#/c/11890/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java@484
PS1, Line 484: ("coalesce(1,
> nit: more meaningful name?
Done



--
To view, visit http://gerrit.cloudera.org:8080/11890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I99a28b170bc0132424041e51f61ebe5c848c4083
Gerrit-Change-Number: 11890
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 15 Nov 2018 23:41:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging

2018-11-15 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11890

to look at the new patch set (#2).

Change subject: IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging
..

IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging

Impala provides a single test class ExprRewriteRulesTest for all rewrite
rules. Each rule class has an associated test function. The
SimplifyConditionalRule class is complex, and the result test function
is quite large.

When doing work to modify a particular conditional rewrite, it became
clear that debugging would be much easier if each detailed rewrite had
its own test rather than using one huge test function. This ticket asks
to break up the big function. (JUnit does not care about small vs. large
functions.)

This patch splits the function separate from later work that will modify
the tests themselves.

Also unified test function naming, using the standard "javaStyle" rather
than the unusual "CppStyle" that was used.

Testing: this is a test, ran the test to ensure it still passes.

Change-Id: I99a28b170bc0132424041e51f61ebe5c848c4083
---
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
1 file changed, 76 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/11890/2
--
To view, visit http://gerrit.cloudera.org:8080/11890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99a28b170bc0132424041e51f61ebe5c848c4083
Gerrit-Change-Number: 11890
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-15 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11893 )

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11893/4/fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java
File fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java:

http://gerrit.cloudera.org:8080/#/c/11893/4/fe/src/main/java/org/apache/impala/analysis/CollectionStructType.java@28
PS4, Line 28: import com.google.common.base.Preconditions;
This turns out to be an excellent change. Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/11893
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
Gerrit-Change-Number: 11893
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Anonymous Coward (168)
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 15 Nov 2018 23:27:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7807: Analysis test fixture

2018-11-15 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11881

to look at the new patch set (#5).

Change subject: IMPALA-7807: Analysis test fixture
..

IMPALA-7807: Analysis test fixture

Refactors the existing ExprRewriteRulesTest to pull out the framework
functionality into a "test fixture" class which can be used for a
variety of tests. The fixture allows more variation, and access to
more intermediate state, than does the existing function-based
framework. For example, the fixture allows setting per-query
options, allows doing full or partial (no rewrite) analysis, provides
access to things like the Analyzer, and more.

The fixture pulls in code from FrontEndBase to handle the low-level
tasks of running a query, and from ExprRewriteRulesTest for the
rewrite-specific aspects. For now, the FrontEndBase base class was
left unchanged.  ExprRewriteRulesTest is refactored to use the new
fixture in order to illustrate usage of the test fixture.

The key value of this work is to allow greater detail when testing
future change requests.  In order to keep the refactoring as simple
and clean, no new test cases were added here

Testing: since this is a test fixture, the refactored ExprRewriteRulesTest
implicitly tests the fixture code. No "production" (non-test) code was
changed in this patch.

Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
---
A fe/src/test/java/org/apache/impala/analysis/AnalysisFixture.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/QueryFixture.java
A fe/src/test/java/org/apache/impala/analysis/RewriteFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
6 files changed, 503 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11881/5
--
To view, visit http://gerrit.cloudera.org:8080/11881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
Gerrit-Change-Number: 11881
Gerrit-PatchSet: 5
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7807: Analysis test fixture

2018-11-15 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Fredy Wijaya, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11881

to look at the new patch set (#4).

Change subject: IMPALA-7807: Analysis test fixture
..

IMPALA-7807: Analysis test fixture

Refactors the existing ExprRewriteRulesTest to pull out the framework
functionality into a "test fixture" class which can be used for a
variety of tests. The fixture allows more variation, and access to
more intermediate state, than does the existing function-based
framework. For example, the fixture allows setting per-query
options, allows doing full or partial (no rewrite) analysis, provides
access to things like the Analyzer, and more.

The fixture pulls in code from FrontEndBase to handle the low-level
tasks of running a query, and from ExprRewriteRulesTest for the
rewrite-specific aspects. For now, the FrontEndBase base class was
left unchanged.  ExprRewriteRulesTest is refactored to use the new
fixture in order to illustrate usage of the test fixture.

The key value of this work is to allow greater detail when testing
future change requests.  In order to keep the refactoring as simple
and clean, no new test cases were added here

Testing: since this is a test fixture, the refactored ExprRewriteRulesTest
implicitly tests the fixture code. No "production" (non-test) code was
changed in this patch.

Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
---
A fe/src/test/java/org/apache/impala/analysis/AnalysisFixture.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
A fe/src/test/java/org/apache/impala/analysis/QueryFixture.java
A fe/src/test/java/org/apache/impala/analysis/RewriteFixture.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
6 files changed, 500 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11881/4
--
To view, visit http://gerrit.cloudera.org:8080/11881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
Gerrit-Change-Number: 11881
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7807: Analysis test fixture

2018-11-05 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11881


Change subject: IMPALA-7807: Analysis test fixture
..

IMPALA-7807: Analysis test fixture

Refactors the existing ExprRewriteRulesTest to pull out the framework
functionality into a "test fixture" class which can be used for a
variety of tests. The fixture allows more variation, and access to
more intermediate state, than does the existing function-based
framework. For example, the fixture allows setting per-query
options, allows doing full or partial (no rewrite) analysis, provides
access to things like the Analyzer, and more.

The fixture pulls in code from FrontEndBase to handle the low-level
tasks of running a query, and from ExprRewriteRulesTest for the
rewrite-specific aspects. For now, the FrontEndBase was left unchanged.
ExprRewriteRulesTest is refactored to use the new fixture in order to
illustrate how it can be used.

The key value of this work is to allow greater detail in testing in
future change requests. No new test cases were added here in order to
keep the refactoring as simple and clean as possible.

Testing: since this is a test fixture, the refactored ExprRewriteRulesTest
implicitly tests the fixture code. No "production" (non-test) code was
changed in this patch.

Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
---
A fe/src/test/java/org/apache/impala/analysis/AnalysisFixture.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
3 files changed, 478 insertions(+), 89 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/11881/1
--
To view, visit http://gerrit.cloudera.org:8080/11881
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id69c99b284960f16394f61072d38dd00269bc10c
Gerrit-Change-Number: 11881
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7805: Emit zero as "0" in toSql()

2018-11-05 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11878


Change subject: IMPALA-7805: Emit zero as "0" in toSql()
..

IMPALA-7805: Emit zero as "0" in toSql()

It turns out that Impala has a somewhat Baroque way to represent the
value of a numeric 0.  NumericLiteral.toSql() uses the Java
BigDecimal class to convert a numeric value to a string for use in
explained plans and in verifying expression rewrites.

The default Java behavior is to consider scale when rendering numbers,
including 0. Thus, depending on precision and scale, you may get:

0
0.0
0.00
0.000
...
0E-38

However, mathematically, zero is zero. Plans attach no special meaning
to the extra decimal points or trailing zeros.

To make testing easier, changed the behavior to always emit "0" when the
value is zero, regardless of precision or scale.

Testing: Reran the planner tests and modified captured plans that had
the 0.0, 0.00 variations of zero.

Since this change affects only EXPLAIN output, it cannot affect the
operation of queries. If may impact other tests that compare EXPLAIN
output to a "golden" copy.

Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1
---
M fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
6 files changed, 28 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/11878/1
--
To view, visit http://gerrit.cloudera.org:8080/11878
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0b2f2f34fe5e6003de407301310ccf433841b9f1
Gerrit-Change-Number: 11878
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging

2018-11-06 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11890


Change subject: IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging
..

IMPALA-7821: Refactor ExprRewriteRulesTest for easier debugging

Impala provides a single test class ExprRewriteRulesTest for all rewrite
rules. Each rule class has an associated test function. Th
SimplifyConditionalRule class is complex, and the result test function
is quite large.

When doing work to modify a particular conditional rewrite, it became
clear that debugging would be much easier if each detailed rewrite had
its own test rather than using one huge test function. This ticket asks
to break up the big function. (JUnit does not care about small vs. large
functions.)

This patch splits the function separate from later work that will modify
the tests themselves.

Also unified test function naming, using the standard "javaStyle" rather
than the unusual "CppStyle" that was used.

Added a few new tests, including some that do not pass (commented out)
to motivate the need for the refactoring.

Testing: this is a test, ran the test to ensure it still passes.

Change-Id: I99a28b170bc0132424041e51f61ebe5c848c4083
---
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
1 file changed, 112 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/90/11890/1
--
To view, visit http://gerrit.cloudera.org:8080/11890
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I99a28b170bc0132424041e51f61ebe5c848c4083
Gerrit-Change-Number: 11890
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7808: Refactor Analyzer for easier debugging

2018-11-06 Thread Paul Rogers (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11883

to look at the new patch set (#4).

Change subject: IMPALA-7808: Refactor Analyzer for easier debugging
..

IMPALA-7808: Refactor Analyzer for easier debugging

Changes two blocks of code to make debugging easier. No functional
changes occur; changes are pure refactoring. A trivial change in
AnalyzerContext removes a nested conditional clause.

A larger change in SelectStmt takes the large analysis function and
breaks it into a series of smaller functions. The functions were large
because they shared state: variables created near the top are used much
later near the bottom.

To solve this, moved the code into an "algorithm" class whose only job
is to hold onto the temporary state so that the big function can be
broken into smaller pieces, with the temporary class fields used in
place of the former local variables.

For the most part, the existign code was simply split into functions
and indented.  One block of code had to be moved below the inner class
since it is not part of the analysis process.

Testing: No functional change, changes are purely structure.
Reran all tests, which passed.

Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 767 insertions(+), 688 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11883/4
--
To view, visit http://gerrit.cloudera.org:8080/11883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
Gerrit-Change-Number: 11883
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7808: Refactor Analyzer for easier debugging

2018-11-05 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11883


Change subject: IMPALA-7808: Refactor Analyzer for easier debugging
..

IMPALA-7808: Refactor Analyzer for easier debugging

Changes two blocks of code to make debugging easier. No functional
changes occur; changes are pure refactoring. A trivial change in
AnalyzerContext removes a nested conditional clause.

A larger change in SelectStmt takes the large analysis function and
breaks it into a series of smaller functions. The functions were large
because they shared state: variables created near the top are used much
later near the bottom.

To solve this, moved the code into an "algorithm" class whose only job
is to hold onto the temporary state so that the big function can be
broken into smaller pieces, with the temporary class fields used in
place of the former local variables.

This CR introduces the inner algorithm class, but leaves the original
code at the original indent level to make reviews easier. You should see
the same code, now broken into smaller functions, with a new "driver"
section that calls these new functions from where the code itself
previously resided.

Once a review is done of the code in this form, I can revise this CR
with the only change being to indent the methods one level, which can be
reviewed as a second step.

One block of code had to be moved below the inner class since it is not
part of the analysis process.

Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
Testing: Reran all tests, which passed.
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
2 files changed, 227 insertions(+), 149 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/11883/1
--
To view, visit http://gerrit.cloudera.org:8080/11883
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I576c80c4c7a974df226fc91d8903db275069ed52
Gerrit-Change-Number: 11883
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7823: Clean up Java warnings, fix minor issues

2018-11-08 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11893

to look at the new patch set (#3).

Change subject: IMPALA-7823: Clean up Java warnings, fix minor issues
..

IMPALA-7823: Clean up Java warnings, fix minor issues

Roll-up of trivial Java fixes: for warnings, formatting, toString()
methods. Done as a separate patch to keep others small and focused.

Testing: no functional changes. Ran full tests to check for
regressions.

Change-Id: I4b174d6448371c55e98487c6d4212c74c6b0c79e
---
M .gitignore
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/ColumnLineageGraph.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateFunctionStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateOrAlterViewStmtBase.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/DescribeTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FunctionName.java
M fe/src/main/java/org/apache/impala/analysis/GrantRevokePrivStmt.java
M fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
M fe/src/main/java/org/apache/impala/analysis/MultiAggregateInfo.java
M fe/src/main/java/org/apache/impala/analysis/ParquetHelper.java
M fe/src/main/java/org/apache/impala/analysis/PlanHint.java
M fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
M fe/src/main/java/org/apache/impala/analysis/RangePartition.java
M fe/src/main/java/org/apache/impala/analysis/ShowGrantPrincipalStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/catalog/AuthorizationException.java
M fe/src/main/java/org/apache/impala/catalog/CatalogDeltaLog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogException.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectImpl.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M 
fe/src/main/java/org/apache/impala/catalog/HdfsPartitionLocationCompressor.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DataPartition.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeExprsRule.java
M fe/src/main/java/org/apache/impala/rewrite/RemoveRedundantStringCast.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/FeSupport.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationStmtTest.java
M fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/StmtMetadataLoaderTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
62 files changed, 56 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/11893/3
--
To view, visit http://gerrit.cloudera.org:8080/11893
To 

[Impala-ASF-CR] IMPALA-7310: Use NDV=1 for a Column with all nulls

2018-11-08 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11528 )

Change subject: IMPALA-7310: Use NDV=1 for a Column with all nulls
..


Patch Set 12:

Bikramjeet, please take a look at this fix to see how we can coordinate our 
efforts.


--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 12
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Thu, 08 Nov 2018 02:47:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7842: Make query fragments available for unit testing

2018-11-11 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11920


Change subject: IMPALA-7842: Make query fragments available for unit testing
..

IMPALA-7842: Make query fragments available for unit testing

Today, the FrontEnd class uses a functional model to generate a plan:
pass in SQL text to the createExecRequest() method and get back a Thrift
plan ready for serialization.

For unit testing, however, we need access to the intermediate plan
fragments produced by the planner. Inspecting only the Thrift version
results in loss of important details.

This fix introduces a new planner context (PlanCtx) class that passes
information into the planner, and can pass the plan fragments back out.
There is no functional change, just a refactoring of the existing code.

The changes also introduces a new test case that uses this feature to
verify plan cardinality.

Tests: The new test case demonstrates the functionality.

Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
---
M fe/src/main/java/org/apache/impala/service/Frontend.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
2 files changed, 207 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/11920/1
--
To view, visit http://gerrit.cloudera.org:8080/11920
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
Gerrit-Change-Number: 11920
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7842: Make query fragments available for unit testing

2018-11-11 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11920 )

Change subject: IMPALA-7842: Make query fragments available for unit testing
..


Patch Set 1:

Dry-run tests passed: https://jenkins.impala.io/job/pre-review-test/226/


--
To view, visit http://gerrit.cloudera.org:8080/11920
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4c615dbc1d14685a6966c5ca7538777cdc80b74d
Gerrit-Change-Number: 11920
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Sun, 11 Nov 2018 21:16:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7841: Refactor QueryStmt, etc. for easier debugging

2018-11-09 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11915


Change subject: IMPALA-7841: Refactor QueryStmt, etc. for easier debugging
..

IMPALA-7841: Refactor QueryStmt, etc. for easier debugging

Builds on IMPALA-7808 with several additional refactorings:

* IMPALA-7808 minimized code changes. This change cleans up the
  new functions, removing if's and merging the "aggregation"
  function with the body of the new analyze() function.
* Creates a new "QueryAnalyzer" class in the QueryStmt class.
* Changed the SelectStmt's SelectAnalyzer to extend the new
  QueryAnalyzer. This allows removing some "analyzer"
  parameters.
* Played the same trick for other classes that extend
  QueryStmt.

This is all refactoring: there is no functional change.

Testing: Reran existing tests to ensure that functionality
remained unchanged.

Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
8 files changed, 218 insertions(+), 202 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11915/1
--
To view, visit http://gerrit.cloudera.org:8080/11915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
Gerrit-Change-Number: 11915
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] Use NDV=1 for a Column with all nulls

2018-10-02 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11528 )

Change subject: Use NDV=1 for a Column with all nulls
..


Patch Set 10:

(3 comments)

Thanks for the continued reviews. Cleaned up a few more items. Answered some 
questions.

http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG@31
PS8, Line 31:
> what types of changes are these-- did they manage to change the structure o
Sorry. The note is trying to explain that 1) there is a planner change (that is 
the gist of the problem resolution), and 2) the change occurs only in the very 
narrow conditions spelled out in this note.

The line in question here explains why the adjustment is applied only to NDV of 
0-1 (and not, say, 0,-10). If we have NDV=10, and column is nullable, then we 
certainly might want to adjust NDV to 11. But, if we do, many tests fail due to 
changed cardinality, changed memory and so on. (The tests "fail" in the sense 
that the text plans have changed. The structure is the same, just the numbers 
differ.)

The original bug complained that, with NDV=0, the planner did misfire, did 
produce an incorrect plan. It did so because, in aggregation, we multiply 
cardinalities by 0, resulting in an overall cardinality of 0.

The new tests verify that, after the adjustments, cardinality estimates are no 
longer zeroed out in aggregate nodes, but instead bubble up. This will, 
presumably, fix the original problem. (The original bug did not give sufficient 
information to completely reproduce the particular query that failed.)


http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/NullTable/large_data.csv
File testdata/NullTable/large_data.csv:

http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/NullTable/large_data.csv@1
PS8, Line 1:
> Looks like theres several conventions under testdata... I see testdata/data
Created a new NullRows folder, moved this file there and renamed it to 
"data.csv".


http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/11528/8/testdata/datasets/functional/functional_schema_template.sql@1355
PS8, Line 1355: 
> I have no idea, but I counted BASE_TABLE_NAME and see it shows up 105 times
Removed the Kudu portion.



--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 10
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 02 Oct 2018 19:53:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Use NDV=1 for a Column with all nulls

2018-10-02 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#10).

Change subject: Use NDV=1 for a Column with all nulls
..

Use NDV=1 for a Column with all nulls

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Bug: IMPALA-7310: All-null columns give wrong estimates in planner
Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullRows/data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 452 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/10
-- 
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 10
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-10-02 Thread Paul Rogers (Code Review)
Paul Rogers has abandoned this change. ( http://gerrit.cloudera.org:8080/11544 )

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..


Abandoned

Duplicate of 11528
--
To view, visit http://gerrit.cloudera.org:8080/11544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I282951f20598839fad880995a032e016845083db
Gerrit-Change-Number: 11544
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-28 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11544


Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459

Fixes

Change-Id: I282951f20598839fad880995a032e016845083db
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 443 insertions(+), 23 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/11544/1
--
To view, visit http://gerrit.cloudera.org:8080/11544
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I282951f20598839fad880995a032e016845083db
Gerrit-Change-Number: 11544
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-28 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11528 )

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..


Patch Set 8:

(32 comments)

Thanks for the reviews. Here are my responses thus far. Most are of the form 
"did it." Vuk, I had a couple of questions on your comments.

Thanks,

- Paul

http://gerrit.cloudera.org:8080/#/c/11528/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11528/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-7310: All-null columns give wrong estimates in planner
> We tend to have a single line subject with a newline after it. That ends up
Other projects require the JIRA ticket number in the commit message. Do we? 
Such projects tend to favor the JIRA ticket title (or a cleaned up version) as 
the description. Do we? In these projects, the assumption is that the patch is 
to fix the listed ticket, so no need to include the word "Fix" in the patch 
title.

Happy to remove the ticket number and change the title to "Correct NDV 
estimates for all-null columns."

I pushed an update with the newline which shows up in my view of the patch.

Did that revision replace the one here on which you commented? Is there a 
better way to revise the commit message?


http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG@24
PS8, Line 24: * Table column (not an internal column such as COUNT(*))
> Base table?
Ack


http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG@27
PS8, Line 27: not provide a null count
> when does this happen when stats are computed?
See revised message. I saw multiple cases in the test DBs in which stats were 
computed, but null count was -1 (undefined), so I needed to handle that case in 
order to get things to work.


http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG@28
PS8, Line 28: NDV <= 1
> if its unknown whether there are nulls or not (from the prev. line), could 
Added to the commit message:

Research for this patch revealed that Impala treats NDVs in two distinct ways:  
   

* Stats (which use the NDV function) computes NDV as the number of distinct 
non-null values. (That is, the NDV of (0, null) is 1.
* The planner itself when working with constants, uses a definition of NDV that 
includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know that the NDV in 
stats excludes nulls (except for the BOOLEAN type), and we know that the column 
contains nulls, we can bump up the NDV to convert from the stats definition to 
the planner definition. But, to avoid regressions, we do so in a very narrow 
range of NDV values: only 0 and 1.


http://gerrit.cloudera.org:8080/#/c/11528/8//COMMIT_MSG@31
PS8, Line 31: Any larger adjustment
> confused... if we're adjusting for nulls, and we're treating a null as a si
Reworded:

Technically, the adjustment should apply to all NDV values. However, it turns 
out that if we do so, we end up with many failures in PlannerTest in those 
tests that work with TPC-H tables. The TPC-H tables have multiple columns 
marked as nullable but which actually have no nulls. Some of these columns also 
have a low NDV.  

By limiting the NDV adjustment to the narrow range, the TPC-H tests need not be 
updated.  Since end users could have a similar situation, the narrow range 
reduces the chance that this fix might impact such workloads.


http://gerrit.cloudera.org:8080/#/c/11528/2/.gitignore
File .gitignore:

http://gerrit.cloudera.org:8080/#/c/11528/2/.gitignore@48
PS2, Line 48:
> I'm not a stickler, but you could separate the .gitignore stuff into a sepa
Reverted all changes to this file.


http://gerrit.cloudera.org:8080/#/c/11528/8/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/11528/8/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@111
PS8, Line 111:   public ArrayList getGroupList() { return groupingExprs_; 
}
> where is this used in this change?
Was used in a test, but that test ended up not being include as part of this 
patch, so removed the line.


http://gerrit.cloudera.org:8080/#/c/11528/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java
File fe/src/main/java/org/apache/impala/analysis/SlotRef.java:

http://gerrit.cloudera.org:8080/#/c/11528/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@102
PS2, Line 102:   private void computeNdv() {
> line has trailing whitespace
Ack


http://gerrit.cloudera.org:8080/#/c/11528/2/fe/src/main/java/org/apache/impala/analysis/SlotRef.java@105
PS2, Line 105: // Potentially adjust NDV for nulls if the column is has 
stats,
> This used to execute even when hasStats() was false. What's numDistinctValu
Defaults to -1.

Put it back as 

[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-28 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#8).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
11 files changed, 455 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/8
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 8
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-09-28 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#7).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Table column (not an internal column such as COUNT(*))
* Column is nullable
* Column has stats
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

In this narrow case, we add one to NDV to account for nulls.
(Any larger adjustment throws off the TPC-H tests which have
multiple columns, marked as non-null, with low NDV, but which
actually include no nulls.)

The change minimized impact on PlannerTest, but still some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
11 files changed, 443 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/7
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7310: All-null columns give wrong estimates in planner

2018-10-01 Thread Paul Rogers (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins, Vuk Ercegovac,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11528

to look at the new patch set (#9).

Change subject: IMPALA-7310: All-null columns give wrong estimates in planner
..

IMPALA-7310: All-null columns give wrong estimates in planner

Modified the planner to handle low-value NDVs by adjusting them
upward by one to account for null values. Thus, even an all-null
column, which has an NDV of 0 in stats, will have an NDV of 1 in
the planner. (The planner already expects NDV to include nulls.)

Modified the front end to allow capturing the full plan for use in
a unit test. Added unit tests that verify estimated cardinality
for a plan as a way to verify that the fix solved the scenario
in IMPALA-7310.

Testing required a new table, similar to the existing nulltable,
but which has multiple rows and has stats calculated.

The change was limited to a very narrow range of cases:

* Base table column (not an internal column such as COUNT(*))
* Type is not BOOLEAN (turns out metadata does the needed
  NDV correction for BOOLEAN only.)
* Column has stats
* Column is nullable
* Column does not provide a null count, or null count > 0
* Reported NDV <= 1

Testing showed that, at least for the functional test tables,
we do have cases in which stats are computed, but the null
count is -1 (undefined), which is why null count had to
be considered. If we know the null count, and the null count
is zero, then no adjustment is needed, But, if we don't know
the null count, or it is positive, then adjustment may be
needed.

Research for this patch revealed that Impala treats NDVs in
two distinct ways:

* Stats (which use the NDV function) computes NDV as the number
  of distinct non-null values. (That is, the NDV of (0, null) is
  1.)
* The planner itself when working with constants, uses a definition
  of NDV that includes nulls. That is, the NDV of (0, null) is 2.

This fix attempts to bridge the two definitions, Since we know
that the NDV in stats excludes nulls (except for the BOOLEAN
type), and we know that the column contains nulls, we can bump
up the NDV to convert from the stats definition to the planner
definition. But, to avoid regressions, we do so in a very narrow
range of NDV values: only 0 and 1.

Technically, the adjustment should apply to all NDV values. However,
it turns out that if we do so, we end up with many failures in
PlannerTest in those tests that work with TPC-H tables.
The TPC-H tables have multiple columns marked as nullable but which
actually have no nulls. Some of these columns also have a low NDV.

By limiting the NDV adjustment to the narrow range, the TPC-H tests
need not be updated.  Since end users could have a similar situation,
the narrow range reduces the chance that this fix might impact such
workloads.

Although the change minimized impact on PlannerTest, some
memory numbers needed adjusting for a test in which one
column hit the criteria listed above and had its NDV adjusted.

Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
---
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/analysis/ExprNdvTest.java
A fe/src/test/java/org/apache/impala/planner/CardinalityTest.java
A testdata/NullTable/large_data.csv
M testdata/bin/compute-table-stats.sh
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
10 files changed, 459 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/28/11528/9
--
To view, visit http://gerrit.cloudera.org:8080/11528
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ife657a43c9cafc451bd12ddf857dcb7169e97459
Gerrit-Change-Number: 11528
Gerrit-PatchSet: 9
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7801: Remove toSql() from ParseNode interface.

2018-11-16 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11942 )

Change subject: IMPALA-7801: Remove toSql() from ParseNode interface.
..


Patch Set 2:

(6 comments)

Thanks much for the improvement. Some general comments sprinkled among the 
files.

http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@395
PS2, Line 395:   String sql = toSql(DEFAULT);
Let's think about the meaning of the options for expressions. Expr nodes won't 
have the state to provide un-rewritten SQL: a node is what it is, and does not 
know if it should be considered rewritten or not. The rewritten state is 
something that the outer non-expr node must provide.

Expr nodes can decide whether to include or exclude implicit casts.

This, then, raises and interesting issue. The ToSqlOptions enum is an enum: we 
get to pick one value. But, some of the options overlap: I may want to include 
implicit casts as one choice, and may want source or rewritten SQL on the other 
hand.

I wonder, can we make this simpler somehow?


http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@1300
PS2, Line 1300:   name, (printExpr) ? " '" + toSql(DEFAULT) + "'" : "", 
type_.toString()));
This comment applies in a zillion other places. Our general rule for error 
messages should be that we show the user's original SQL without rewrites or 
implicit casts. We have hundreds of such usages and we have to check that new 
messages use the correct form. Can we standardize these somehow?

One thought is to rename the option from DEFAULT to something like ORIGINAL or 
SOURCE. But, this means we still have to sprinkle this option across many error 
messages. This is, really, exposing implementation in too many places.

As it turns out, ParseNode is an interface, with many direct decedents. Does it 
make sense to introduce a new base class, AbstractParseNode, that has a 
userSql() (or sourceSql()) method that we use in error messages?

Recognizing that expressions are different than statements, maybe:

ParseNode
|- Expr (root of all expressions)
|- AbstractStmt (root of all statements)

The statement base would provide the source/rewritten options. Exprs only 
provide the implicit cast or not options.


http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/ParseNode.java
File fe/src/main/java/org/apache/impala/analysis/ParseNode.java:

http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/ParseNode.java@33
PS2, Line 33:   String toSql(ToSqlOptions options);
A common practice in Java is to include the enum definition in the interface 
file if the enum is used only by the interface (and its implementations).


http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@217
PS2, Line 217:   "OFFSET requires an ORDER BY clause: " + 
limitElement_.toSql(DEFAULT).trim());
I wonder if we can solve a mess here with exceptions also. Rather than building 
message (very inconsistently) in each message, maybe have a builder:

throw new AnalysisException
  .builder("OFFSET requires an ORDER BY clause")
  .expr(limitElement_)
  .build();

The builder would have methods for expressions (which will call the proper 
toSql version), for statements, for additional context, and so on.

This would avoid the issue of exposing the magic toSql(DEFAULT) incantation in 
zillions of places.


http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java@44
PS2, Line 44: import static org.apache.impala.analysis.ToSqlOptions.DEFAULT;
Two suggestions.

1. Rename this as ToSqlOption (or ToSqlFormat) since we can pass only one 
option.
2. Import the enum, not the static value. Reference it with the enum name: 
ToSqlFormat.DEFAULT (or, based on earlier suggestions, ToSqlFormat.SOURCE).


http://gerrit.cloudera.org:8080/#/c/11942/2/fe/src/main/java/org/apache/impala/planner/SortNode.java@214
PS2, Line 214: output.append(info_.getSortExprs().get(i).toSql(DEFAULT) 
+ " ");
This is an example of the point made elsewhere. It SEEMS we are asking the 
expression to give the DEFAULT (source) SQL. But, an expression can't make that 
decision. if we want the source SQL, we'd have to have cached it.

I'm working on a project to do just that in a more standardized form. 

[Impala-ASF-CR] IMPALA-7867 (Part 4): Collection cleanup in catalog

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12131


Change subject: IMPALA-7867 (Part 4): Collection cleanup in catalog
..

IMPALA-7867 (Part 4): Collection cleanup in catalog

Continues the collection clean work to:

* Use collection interfaces for variable and function argument
  declarations,
* Replace generic Guava newArrayList(), etc. calls with the direct use
  of the Java collection classes,
* Clean up unused imports and add override annotations.

This patch focuses on the catalog module and its tests.

Tests: this is purely a code change, no functional change. Reran
existing tests.

Change-Id: Ic83425201c90966aae4c280d94cf1b427b3d71d1
---
M fe/src/main/java/org/apache/impala/catalog/AggregateFunction.java
M fe/src/main/java/org/apache/impala/catalog/AuthorizationPolicy.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObjectCache.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/DiskIdMapper.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java
M fe/src/main/java/org/apache/impala/catalog/FeTable.java
M fe/src/main/java/org/apache/impala/catalog/Function.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsFileFormat.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsStorageDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/HiveStorageDescriptorFactory.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladTableUsageTracker.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/PartitionStatsUtil.java
M fe/src/main/java/org/apache/impala/catalog/PrimitiveType.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
M fe/src/main/java/org/apache/impala/catalog/StructType.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoadingMgr.java
M fe/src/main/java/org/apache/impala/catalog/TopicUpdateLog.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/catalog/View.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalCatalog.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalDb.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalHbaseTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
M fe/src/test/java/org/apache/impala/catalog/CatalogObjectToFromThriftTest.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsPartitionTest.java
M fe/src/test/java/org/apache/impala/catalog/HdfsStorageDescriptorTest.java
M fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
M fe/src/test/java/org/apache/impala/catalog/TestSchemaUtils.java
47 files changed, 331 insertions(+), 333 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/12131/2
--
To view, visit http://gerrit.cloudera.org:8080/12131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic83425201c90966aae4c280d94cf1b427b3d71d1
Gerrit-Change-Number: 12131
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..


Patch Set 9:

(39 comments)

This will be a great feature. Detailed review from the perspective of 1) 
synchronization and 2) diagnosing issues when they occur in the field.

This is a partial review, will review remaining files separately.

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@302
PS9, Line 302: if (BackendConfig.INSTANCE.getHMSPollingFrequencyInSeconds() 
<= 0) {
Good check. Better to define this as behavior: to disable the polling, set the 
interval to 0 (or, trivially <0). This would be a good fall-back safety-valve 
for support cases, tests, etc.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@313
PS9, Line 313:   } catch (TException e) {
Nice use of try-with-catch. No need for nested tries. Remove inner catch, move 
catch to outer try.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2317
PS9, Line 2317:   MetastoreEventsProcessor getMetastoreEventProcessor() {
Should this be public? If only protected, please explicitly label it protected.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java
File fe/src/main/java/org/apache/impala/catalog/Db.java:

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@112
PS9, Line 112: synchronized (thriftDb_) { thriftDb_.setMetastore_db(msDb); }
What is the synchronization model?

As written, we lock on the prior DB to set the new one. This means access is 
not synchronized. Better to add synchronized to the method itself, here and 
accessor.

But, more broadly, if we synchronize only at the DB level, we'll have all 
manner of race conditions. One client will grab the DB with one version the 
thrift object, then another will be slotted in underneath.

This probably needs more thought. Is the model documented in the write-up?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@159
PS9, Line 159: synchronized (thriftDb_) { return thriftDb_.deepCopy(); }
How often do we do this? Do we want this to block on an HMS call about 
parameters?

Maybe a better model is for the RPC calls to do this:

- Obtain a lock
- Get a pointer to the current thrift object
- Release the lock
- Make the RPC

Not that there is no need to copy the thrift object above: the logic here 
implies that the Thrift object is immutable once set. All we are doing is 
making sure we use the same object throughout the RPC.

Still, not convinced we have the right locking semantics.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/Db.java@164
PS9, Line 164: synchronized (thriftDb_) { return thriftDb_.getDb_name(); }
Now getting the name is synchronized on an HMS call. That can't be good for 
performance. The planner/analyzer gets the name all the time, so we really 
don't want to do this.

The name should be invariant or all heck breaks loose in the planner. So, if we 
swap out Thrift objects, pull the name outside the thrift object and cache it 
in a final field.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java
File fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java:

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@50
PS9, Line 50:  * in Hive metastore using the the public API 
get_next_notification These events could
Impala generally does not use Javadoc markup, sadly. But, if you want to use it 
(we really should), then a reference to a Java method should be: {@link 
Class#method}.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/main/java/org/apache/impala/catalog/MetastoreEventsProcessor.java@63
PS9, Line 63:  *   CREATE_TABLE, CREATE_DATABASE
Wrong form for HTML lists. I think you want a definition list:


  CREATE_TABLE, CREATE_DATABASE
  A new table/database is ... 
...

That sad, if Vuk was here, he'd have you rip all this out. So, a broader 
question is: do we want our Javadoc to follow C++ doc standard so the C++ folks 
can follow it, or Javadoc standard so the documentation looks correct when 
formatted? I prefer the Javadoc approach, so would encourage you to keep the 
(corrected) formatting.



[Impala-ASF-CR] IMPALA-7968, Part 1: JSON serialization framework

2018-12-27 Thread Paul Rogers (Code Review)
Hello Fredy Wijaya, Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12079

to look at the new patch set (#6).

Change subject: IMPALA-7968, Part 1: JSON serialization framework
..

IMPALA-7968, Part 1: JSON serialization framework

Provides a set of interfaces and classes to serialize the analyzer AST
to JSON. This part provides the serialization framework itself, later
parts will provide the serialization code and test framework.

The framework uses Jackson streaming API to JSON. Though the primary
purpose is to generate JSON for in PlannerTest text comparisons, the use
of Jackson ensures that the JSON could, if wanted, be exported to other
tools.

While Jackson provides low-level serialization, the classes here provide
a set of higher-level abstractions that make the AST serialization (not
in this patch) simpler than it would be using the low-level Jackson
tools.

This framework itself is independent of the AST and can also be used to
serialize the physical plan as well.

Testing: test cases exercise the mechanisms. The code is an island in
the patch; no other code uses it yet.

Change-Id: Ie3101c2708bf6cf4bec61af83a5db9b6d70ddd9c
---
A 
fe/src/main/java/org/apache/impala/common/serialize/AbstractTreeSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/ArraySerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/JacksonTreeSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/JsonSerializable.java
A fe/src/main/java/org/apache/impala/common/serialize/ObjectSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/SerializationError.java
A fe/src/main/java/org/apache/impala/common/serialize/ToJsonOptions.java
A fe/src/main/java/org/apache/impala/common/serialize/TreeSerializer.java
A 
fe/src/test/java/org/apache/impala/common/serialize/JacksonTreeSerializerTest.java
9 files changed, 1,062 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12079/6
--
To view, visit http://gerrit.cloudera.org:8080/12079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3101c2708bf6cf4bec61af83a5db9b6d70ddd9c
Gerrit-Change-Number: 12079
Gerrit-PatchSet: 6
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-7867 (Part 4): Collection cleanup in catalog

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12131 )

Change subject: IMPALA-7867 (Part 4): Collection cleanup in catalog
..


Patch Set 2:

Pre-review tests passed: https://jenkins.impala.io/job/pre-review-test/253/


--
To view, visit http://gerrit.cloudera.org:8080/12131
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic83425201c90966aae4c280d94cf1b427b3d71d1
Gerrit-Change-Number: 12131
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 27 Dec 2018 18:33:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7970 : Add support for metastore event based automatic invalidate

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12118 )

Change subject: IMPALA-7970 : Add support for metastore event based automatic 
invalidate
..


Patch Set 9:

(20 comments)

Remaining comments, mostly about the tests.

High level, seems the open questions are:

* That bit about replacing the guts of the DB object.
* Error handling and reporting
* Whether the event handler should attempt to partially load DBs and tables, or 
just do an invalidate.
* Invalidation of specific partitions rather than all of them.

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java
File 
fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@64
PS9, Line 64:   private static MetastoreEventsProcessor eventsProcessor;
Not thrilled with the convention, but the team likes an underscore after each 
field name.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@109
PS9, Line 109: assertNotNull(catalog.getDb(TEST_DB_NAME));
Per prior review comments, should not add the new DB.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@120
PS9, Line 120: createDatabase("database_to_be_dropped");
If we don't add the DB automatically, should reference it and assert it is in 
the cache.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@122
PS9, Line 122: assertTrue(catalog.getDb("database_to_be_dropped") != null);
assertNotNull(...)


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@125
PS9, Line 125: assertTrue("Database should not be found after processing 
drop_database event",
assertNull(...)

Oddly, later tests do use assertNull(). So, modify the code here, and the other 
cases where assertTrue is used to check for null/not null.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@136
PS9, Line 136:   @Test(expected = DatabaseNotFoundException.class)
This is confusing. If we expect an exception, then the exception terminates the 
execution. Logically, it should come from the last statement in the body. But, 
the last statement is a Unit assertion. So, something is strange.

Often easier to do this:

try {
  doSomethingThatShoudFail();
  fail();
} catch (ExpectedException e) {
  // Expected
}


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@139
PS9, Line 139: final String TBL_TO_BE_DROPPED = "tbl_to_be_dropped";
Looks like a constant. Move outside the method and make static.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@148
PS9, Line 148: loadTable(TBL_TO_BE_DROPPED);
Does this assert that the table exists in HMS, was loaded in the cache, and has 
the expected contents? Else, when we drop it, we can't tell if the drop code is 
broken if we don't know if the table was actually in the cache.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@155
PS9, Line 155: // throws DatabaseNotFoundException
If it throws, we won't get to the assertTrue.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@161
PS9, Line 161:   @Ignore("Disabled until we fix Hive bug to deserialize 
alter_database event messages")
Reference HIVE JIRA ticket number.

Also, should reference this bug in the event processor code.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@217
PS9, Line 217: createTable(TEST_TABLE_NAME_NONPARTITIONED, false);
Start by asserting that the table is not yet in the cache?


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@255
PS9, Line 255: List> partVals = new ArrayList<>(1);
Allocating just 1 member up front, then adding four items. Maybe just omit the 
initial size arg; not helping us much here.


http://gerrit.cloudera.org:8080/#/c/12118/9/fe/src/test/java/org/apache/impala/catalog/MetastoreEventsProcessorTest.java@277
PS9, Line 277: // change some partitions
There is a recent fix that does specialized calls to HDFS to check for updated 
vs. new partitions. It will be tricky to combine that test with this one, but 
we should ensure that the automated way of updating a partition uses the call 

[Impala-ASF-CR] IMPALA-5973: Provide query plan in JSON format

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11974 )

Change subject: IMPALA-5973: Provide query plan in JSON format
..


Patch Set 3:

Update on this one. Been working on a tricky planner bug. Ended up using the 
JSON serialization framework in #12079 to create a basic JSON serialization for 
the plan tree.

Suggestion: to continue with this patch, maybe start with the 12079 serializer 
and the skeletal work mentioned above. Expand it to all nodes.

The work I did was rough and ready: field names were chosen quickly and I 
serialized only what I needed. Thought is needed to figure out what to 
serialize and what to call it.

If we serialize too close to the code, then the JSON form of the plan changes 
when we make trivial code changes. If we are too abstract, we loose vital 
details (the current EXPLAIN text, for example, omits estimated cardinality, 
which is rather vital.)

So there would be huge value in working out the desired JSON structure, 
implementing serialization for all plan nodes, and wiring it up the the EXPLAIN 
implementation.


--
To view, visit http://gerrit.cloudera.org:8080/11974
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f07e2223be58b7323e1ea2fa44b62626cdf4944
Gerrit-Change-Number: 11974
Gerrit-PatchSet: 3
Gerrit-Owner: Pranay Singh (320)
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pranay Singh (320)
Gerrit-Comment-Date: Thu, 27 Dec 2018 21:21:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7841 (Part 1): Refactor SelectStmt for easier debugging

2018-12-27 Thread Paul Rogers (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11915

to look at the new patch set (#2).

Change subject: IMPALA-7841 (Part 1): Refactor SelectStmt for easier debugging
..

IMPALA-7841 (Part 1): Refactor SelectStmt for easier debugging

Builds on IMPALA-7808 with several additional refactorings:

* IMPALA-7808 minimized code changes. This change cleans up the
  new functions, removing if's and merging the "aggregation"
  function with the body of the new analyze() function.
* Removed an unneeded analyzer argument.

This is all refactoring: there is no functional change.

Testing: Reran existing tests to ensure that functionality
remained unchanged.

Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CollectionTableRef.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
7 files changed, 92 insertions(+), 99 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/11915/2
--
To view, visit http://gerrit.cloudera.org:8080/11915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
Gerrit-Change-Number: 11915
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 


[Impala-ASF-CR] IMPALA-7841 (Part 1): Refactor SelectStmt for easier debugging

2018-12-27 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11915 )

Change subject: IMPALA-7841 (Part 1): Refactor SelectStmt for easier debugging
..


Patch Set 2:

Reduced the scope of the change for easier review. Will push the QueryStmt and 
related changes as a separate patch after this one. This one now just moves 
some code around in SelectStmt and fixes one name and removes one unneeded 
argument.


--
To view, visit http://gerrit.cloudera.org:8080/11915
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4f4fe3d3a1ab3170e294714dab066d40d169eff1
Gerrit-Change-Number: 11915
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 27 Dec 2018 22:06:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7968, Part 1: JSON serialization framework

2018-12-27 Thread Paul Rogers (Code Review)
Hello Fredy Wijaya, Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12079

to look at the new patch set (#7).

Change subject: IMPALA-7968, Part 1: JSON serialization framework
..

IMPALA-7968, Part 1: JSON serialization framework

Provides a set of interfaces and classes to serialize the analyzer AST
to JSON. This part provides the serialization framework itself, later
parts will provide the serialization code and test framework.

The framework uses Jackson streaming API to JSON. Though the primary
purpose is to generate JSON for in PlannerTest text comparisons, the use
of Jackson ensures that the JSON could, if wanted, be exported to other
tools.

While Jackson provides low-level serialization, the classes here provide
a set of higher-level abstractions that make the AST serialization (not
in this patch) simpler than it would be using the low-level Jackson
tools.

This framework itself is independent of the AST and can also be used to
serialize the physical plan as well.

Testing: test cases exercise the mechanisms. The code is an island in
the patch; no other code uses it yet.

Change-Id: Ie3101c2708bf6cf4bec61af83a5db9b6d70ddd9c
---
A 
fe/src/main/java/org/apache/impala/common/serialize/AbstractTreeSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/ArraySerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/JsonSerializable.java
A fe/src/main/java/org/apache/impala/common/serialize/ObjectSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/SerializationError.java
A fe/src/main/java/org/apache/impala/common/serialize/ToJsonOptions.java
A fe/src/main/java/org/apache/impala/common/serialize/TreeJsonSerializer.java
A 
fe/src/main/java/org/apache/impala/common/serialize/TreeJsonStringSerializer.java
A fe/src/main/java/org/apache/impala/common/serialize/TreeSerializer.java
A 
fe/src/test/java/org/apache/impala/common/serialize/JacksonTreeSerializerTest.java
10 files changed, 1,081 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12079/7
--
To view, visit http://gerrit.cloudera.org:8080/12079
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie3101c2708bf6cf4bec61af83a5db9b6d70ddd9c
Gerrit-Change-Number: 12079
Gerrit-PatchSet: 7
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-8041, Part 1: Move rewrite rules into expr nodes

2019-01-06 Thread Paul Rogers (Code Review)
Paul Rogers has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12143


Change subject: IMPALA-8041, Part 1: Move rewrite rules into expr nodes
..

IMPALA-8041, Part 1: Move rewrite rules into expr nodes

This patch is part of the task to integrate expression rewrites into
the analysis process to avoid the need for two analysis passes.

The analyzer provides a rewrite engine which traverses the
expression tree and compares each node against a list of rewrite rules.

The project goal is to apply the rules as we analyze each node. The
simplest way to do so is for each node to hold the code for its own
rewrites, callable via a virtual method: rewrite().

This patch is a first step: the rewrite code moves from the various
rewrite rules into the expression node that is to be rewritten. In some
cases nodes have a single rule which is moved from the rewrite rule
directly into the rewrite() method.

In other cases, a node has multiple rules. The different rules for the
same node are moved into distinct methods. Though these methods are
all called by the main rewrite() method, nothing calls that rewrite()
method in this patch.

Finally, this patch introduces a "stub" expression analyzer. In a later
patch this class will drive the combined analysis/rewrite logic. For
now, it is mostly a placeholder.

The rewrite rules themselves are now just stubs retained to minimze
change. A future patch will retire them once the analysis/rewrite
integration is complete.

This patch is designed to have no functional change: code merely
moves from one location (the rewrite rules) to another (the expression
nodes). Some "shim" code is added, but the functionality is unchanged.

Tests: reran all FE tests to verify no change in behavior.

Change-Id: Ifda7082abea4f6448e6cfef0d9ef7b8565d73fce
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CaseExpr.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
A fe/src/main/java/org/apache/impala/analysis/ExprAnalyzer.java
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/common/TreeNode.java
M fe/src/main/java/org/apache/impala/rewrite/BetweenToCompoundRule.java
M fe/src/main/java/org/apache/impala/rewrite/EqualityDisjunctsToInRule.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeCountStarRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyConditionalsRule.java
M fe/src/main/java/org/apache/impala/rewrite/SimplifyDistinctFromRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
18 files changed, 676 insertions(+), 375 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/12143/1
--
To view, visit http://gerrit.cloudera.org:8080/12143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifda7082abea4f6448e6cfef0d9ef7b8565d73fce
Gerrit-Change-Number: 12143
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 


[Impala-ASF-CR] IMPALA-8041, Part 1: Move rewrite rules into expr nodes

2019-01-06 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12143 )

Change subject: IMPALA-8041, Part 1: Move rewrite rules into expr nodes
..


Patch Set 1:

Passed pre-commit tests: https://jenkins.impala.io/job/pre-review-test/273/


--
To view, visit http://gerrit.cloudera.org:8080/12143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifda7082abea4f6448e6cfef0d9ef7b8565d73fce
Gerrit-Change-Number: 12143
Gerrit-PatchSet: 1
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Mon, 07 Jan 2019 01:12:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-18 Thread Paul Rogers (Code Review)
Hello Bharath Vissapragada, Zoram Thanga, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12192

to look at the new patch set (#2).

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..

IMPALA-8058: Fallback for HBase key scan range estimation

Impala supports "pushing" of HBase key range predicates to HBase so that
Impala reads only rows within the target key range. The planner
estimates the cardinality of such scans by sampling the rows within the
range. However, we have seen cases where sampling returns rows for
unknown reasons. The planner then ends up without a good cardinality
estimate.  (Specifically, the code does a division by zero and produces
a huge estimate.  See the ticket for details.)

Impala appears to use the sampling strategy to compute cardinality
because HBase uses generally do not gather table stats. The resulting
estimates are often off by 2x or more. This is a problem in tests as it
causes cardinality numbers to vary greatly from the expected values.
Fortunately, tests do gather HMS stats. There may be cases where users
do as well. This fix exploints that fact.

This fix:

* Creates a fall-back strategy that uses table cardinality from HMS and
  the selectivity of the key predicates to estimate cardinality when the
  sampling approach fails.
* The fall-back strategy requires tracking the predicates used for HBase
  keys so that their selectivity can be applied during fall-back
  calculations.
* Moved HBase key calculation out of the SingleNodePlanner into the
  HBase scan node as suggested by a "TO DO" in the code. Doing so
  simplified the new code.
* In the spirit of IMPALA-7919, adds the key predicates to the HBase
  scan node in the EXPLAIN output.

Testing:

* Adds a query context option to disable the normal key sampling to
  force the use of the fall-back. Used for testing.
* Adds a new set of HBase test cases that use the new feature to check
  plans with the fall-back approach.
* Reran all existing tests.
* Compared cardinality numbers for the two modes: sampling and HMS using
  the cardinality features of IMPALA-8021. The two approaches provide
  different results, but this is mostly due to the missing selectivity
  estimates for inequality operators. (That's a fix for another time.)

Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
---
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/hbase-no-key-est.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
10 files changed, 512 insertions(+), 116 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/12192/2
--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 


[Impala-ASF-CR] IMPALA-8058: Fallback for HBase key scan range estimation

2019-01-18 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12192 )

Change subject: IMPALA-8058: Fallback for HBase key scan range estimation
..


Patch Set 2:

(3 comments)

Addressed code review comments. Rebased on latest master, which now shows 
cardinality estimates in the EXPLAIN plan, so updated the new test file 
accordingly.

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java
File fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/catalog/FeHBaseTable.java@409
PS1, Line 409:
> nit: add tablename, startKey and endKey too?
Done


http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
File fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java:

http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304
PS1, Line 304: // No useful estimate. Rely on HMS row count stats.
> Probably mention here that this doesn't work as expected if the stats are m
Done


http://gerrit.cloudera.org:8080/#/c/12192/1/fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java@304
PS1, Line 304: // No useful estimate. Rely on HMS row count stats.
> Probably mention here that this doesn't work as expected if the stats are m
Done



--
To view, visit http://gerrit.cloudera.org:8080/12192
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic01147abcb6b184071ba28b55aedc3bc49b322ce
Gerrit-Change-Number: 12192
Gerrit-PatchSet: 2
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Sat, 19 Jan 2019 02:29:52 +
Gerrit-HasComments: Yes


  1   2   3   4   5   >