[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

I checked what patches were merged since the verification and there were three, 
but all were unrelated to this, so I'm not going to bother rerunning the GVM 
job.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 17:23:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Reviewed-on: http://gerrit.cloudera.org:8080/16123
Tested-by: Impala Public Jenkins 
Reviewed-by: Aman Sinha 
Reviewed-by: Tim Armstrong 
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
30 files changed, 5,117 insertions(+), 796 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Aman Sinha: Looks good to me, but someone else must approve
  Tim Armstrong: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 12
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 17:21:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-31 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11: Code-Review+1

(1 comment)

LGTM. Nice work !

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4
PS8, Line 4:  RESULTS
> Added some planner tests, will file a JIRA it works in some cases but in ge
Sounds good. Would be quite useful to have that optimization for joins.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 16:39:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 00:32:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6197/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6741/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:11:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277
PS9, Line 277:   JoinOperator joinOp = operand.getSetOperator() == 
SetOperator.EXCEPT ?
> nit: we could declare the variable on l 309 where it's assigned.
Done


http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331
PS9, Line 331: List initialOps = new 
ArrayList<>();
> It doesn't look like we do anything with this view? Is it mean to wrap eiSe
Yes good catch, it was something I refactored out as the union operands can be 
querystmts versus just tablerefs.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> For future reference, I have created a JIRA: IMPALA-10008
Ack


http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212
PS9, Line 212: select distinct id, year, month from functional.alltypestiny 
where year=2009 and month=1
> I guess the distinct is sorta serving as an execution hint here, right? Sin
Mostly to exercise the rewrite test case. Without the distincts for now we 
wouldn't be able to use an INNER join. In this case since id is kind of like a 
key, the distinct is redundant but we don't have a way of detecting that.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4
PS8, Line 4:  RESULTS
> Sorry, one more test suggestion that is based on a common pattern:  branch
Added some planner tests, will file a JIRA it works in some cases but in 
general we could remove the JOIN but instead it just creates and emptyset below 
the join. I could be wrong but seems like a generally useful optimization for 
join types in general.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 18:46:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Shant Hovsepian (Code Review)
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#11).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
30 files changed, 5,117 insertions(+), 796 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/11
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-24 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(1 comment)

I haven't closely reviewed the un-nesting part but overall the patch LGTM. One 
more suggestion below.

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4
PS8, Line 4:  RESULTS
Sorry, one more test suggestion that is based on a common pattern:  branch 
elimination for FALSE predicates for Intersect, Except.  It works for Union 
currently.  I am not completely sure if it will work currently for 
intersect/except .. in which case we could create a follow-up.
There is a broader optimization of predicate pushdown from the outer query into 
the branches of a set operation .. but I think those types of testing can be 
deferred.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 24 Jul 2020 22:55:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-24 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(1 comment)

> Patch Set 8:
>
> (2 comments)

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> Yeah it does add overhead - with the regular equality predicates, we don't
For future reference, I have created a JIRA: IMPALA-10008



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 24 Jul 2020 22:35:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-24 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(4 comments)

Had a few minor comments but I'm nearly at a +1

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277
PS9, Line 277:   InlineViewRef opWrapperView = null;
nit: we could declare the variable on l 309 where it's assigned.


http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331
PS9, Line 331:   eiWrapperView.analyze(analyzer);
It doesn't look like we do anything with this view? Is it mean to wrap eiSelect 
in the set operand?


http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212
PS9, Line 212: select distinct id, year, month from functional.alltypestiny 
where year=2009 and month=1
I guess the distinct is sorta serving as an execution hint here, right? Since 
it would be equivalent with or without the distincts.


http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-query/queries/QueryTest/except.test
File testdata/workloads/functional-query/queries/QueryTest/except.test:

http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-query/queries/QueryTest/except.test@616
PS9, Line 616:
Can we add an explicit test for except behaviour with nulls, e.g.

This should return NULL only - there is no null in alltypes.tinyint_col.

  select tinyint_col from alltypesagg minus select tinyint_col from alltypes


This should return empty because there is a null present.

  select tinyint_col from alltypesagg minus select tinyint_col from alltypesagg 
where month = 1



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 24 Jul 2020 07:05:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16123/9//COMMIT_MSG
Commit Message:

PS9:
note to self: need to focus on tests


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: |  hash predicates: bigint_col IS NOT DISTINCT FROM 
functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM 
functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM 
functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM 
functional.alltypestiny.float_col, id IS NOT DISTINCT FROM 
functional.alltypestiny.id, int_col IS NOT DISTINCT FROM 
functional.alltypestiny.int_col, month IS NOT DISTINCT FROM 
functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM 
functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM 
functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM 
functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM 
functional.alltypestiny.year, string_col IS NOT DISTINCT FROM 
functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM 
functional.alltypestiny.date_string_col
> Actually, I was not referring to planning time but the execution time.  I h
Yeah it does add overhead - with the regular equality predicates, we don't 
insert or probe with rows with null join keys, so the null check is omitted. In 
general it would be helpful to have more nullability info since there are a lot 
of null checks in the compiled code (basically every SlotRef expr)



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 06:27:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> From what I've seen the biggest killer in these situations is with plan tim
Actually, I was not referring to planning time but the execution time.  I 
haven't done a measurement but I would imagine the cpu cost of IS NOT DISTINCT 
to be a bit more than the equality comparison because of the null == null check 
for each row and potentially many columns. Something to evaluate in the future.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:41:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6697/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:03:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(15 comments)

Thanks for all the test suggestions guys!

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544
PS8, Line 2544: // nonterminal making this issue unresolvable.  We rely on the 
left precedence of
> Not your change, but maybe drop a reference to IMPALA-4741 in here.
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546
PS8, Line 2546: // select_stmt (i.e., ORDER BY and LIMIT bind to the 
select_stmt by default, and not the
> Some of the wordings in this comment needs to be updated to remove referenc
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348
PS8, Line 348: public boolean setOperationNeedsRewrite = false;
> It is confusing that this is specifically intended to be set for Except, In
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014
PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from 
functional.alltypes " +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024
PS8, Line 3024:
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60
PS8, Line 60:
> Uncomment
Hah how'd that sneak in


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361
PS8, Line 361: # nested except, shouldn't be unnested, if it had been the 
results would be incorrect
> I didn't quite see what this comment was getting at.
Hah who knows what my state of mind was at that point. I tried to clean up the 
comment a bit. The intent was to contrast this plan with the one above, to 
emphasize except can't be unnested and the difference plan shape as a result.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> That's good that the codegen does some optimization for the hashing+equalit
>From what I've seen the biggest killer in these situations is with plan times 
>dealing with ExprSubstitutionMaps being linear time searches. That combined 
>with the way rewrites and analysis are done, we end getting into super 
>quadratic behavior and JVM GC issues that could easily be avoid with a hash 
>table for exprs.

In general though agree, I had thought it would be better to address this issue 
and DISTINCT placement in general as another rewrite phase.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test
File testdata/workloads/functional-query/queries/QueryTest/except.test:

PS8:
> Can we add a token query or two that use the MINUS and EXCEPT DISTINCT alte
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153
PS8, Line 153: (select 10 except select 11) union all select 10
> This is a repeat of the one just above.
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166
PS8, Line 166: select 10 union all select 11 union all select 11 except select 
10
> Would be good to have something like
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356
PS8, Line 356: b the
> absorb?
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

PS8:
> Can we add a token query or two that use the INTERSECT DISTINCT alternative
Done



[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Shant Hovsepian (Code Review)
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#9).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
29 files changed, 5,038 insertions(+), 796 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/9
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: |  hash predicates: bigint_col IS NOT DISTINCT FROM 
functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM 
functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM 
functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM 
functional.alltypestiny.float_col, id IS NOT DISTINCT FROM 
functional.alltypestiny.id, int_col IS NOT DISTINCT FROM 
functional.alltypestiny.int_col, month IS NOT DISTINCT FROM 
functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM 
functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM 
functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM 
functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM 
functional.alltypestiny.year, string_col IS NOT DISTINCT FROM 
functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM 
functional.alltypestiny.date_string_col
> The hashing and equality checking does get unrolled by codegen so each fixe
That's good that the codegen does some optimization for the hashing+equality.  
Yeah, would be an interesting experiment to quantity the overhead for longer 
variable length values.  I am fine with treating that as a future 
enhancement/investigation. I can create a JIRA if not already there.

Somewhat related to this is the IS NOT DISTINCT FROM comparison.  Since this 
would involve at least an extra comparison for NULL,  it would be really nice 
if the nullability property from the SlotDescriptor could be leveraged and if 
it is guaranteed to be non-null (note in the SlotDescriptor that it could be 
marked non-null but still have NULL values if the entire tuple is null due to 
an  outer join for example) ... for such cases use regular equality comparison 
instead of IS NOT DISTINCT.  This could also be a future enhancement.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 18:19:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: |  hash predicates: bigint_col IS NOT DISTINCT FROM 
functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM 
functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM 
functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM 
functional.alltypestiny.float_col, id IS NOT DISTINCT FROM 
functional.alltypestiny.id, int_col IS NOT DISTINCT FROM 
functional.alltypestiny.int_col, month IS NOT DISTINCT FROM 
functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM 
functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM 
functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM 
functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM 
functional.alltypestiny.year, string_col IS NOT DISTINCT FROM 
functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM 
functional.alltypestiny.date_string_col
> I am a little worried about such cases where people inadvertently do SELECT
The hashing and equality checking does get unrolled by codegen so each 
fixed-width column probably doesn't add that much overhead per row - maybe a 
couple of CPU cycles per value (the fixed-length values of the equi-join 
predicates get materialised into a buffer and then that buffer is hashed).

Being clever here could make a different with strings, especially long strings 
could (stuff like l_comment). Would be an interesting experiment, at least. But 
yeah, from an execution standpoint I'm not concerned about this being 
pathogical, but there's probably an opportunity in general to optimise cases 
like this.

I guess this is matching the behaviour of distinct aggregations - with those 
we're already hashing the full row and adding it to the agg hash tables, so 
that actually might be the place where some optimisation like this would be 
more impactful.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 21 Jul 2020 17:22:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-21 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(7 comments)

Sorry for the delay..sending some comments based on a first pass.

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546
PS8, Line 2546: // (i.e., ORDER BY and LIMIT bind to the select_stmt by 
default, and not the union).
Some of the wordings in this comment needs to be updated to remove references 
to union and use setop instead.


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@a484
PS8, Line 484:
nit: Should we keep this info message ? You could add it conditionally if 
reAnalyze == true after both subquery and setop rewrites.


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348
PS8, Line 348: public boolean containsSetOperation = false;
It is confusing that this is specifically intended to be set for Except, 
Intersect but not for Union ? The name indicates all set operations.  If it is 
only meant for the 'transformable' set operations, should we use a more 
specific name ?


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@38
PS8, Line 38:  * TODO: This classes contains many methods and members that 
should be pushed down to
nit: classes --> class


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60
PS8, Line 60: /*
Uncomment


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: |  hash predicates: bigint_col IS NOT DISTINCT FROM 
functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM 
functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM 
functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM 
functional.alltypestiny.float_col, id IS NOT DISTINCT FROM 
functional.alltypestiny.id, int_col IS NOT DISTINCT FROM 
functional.alltypestiny.int_col, month IS NOT DISTINCT FROM 
functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM 
functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM 
functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM 
functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM 
functional.alltypestiny.year, string_col IS NOT DISTINCT FROM 
functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM 
functional.alltypestiny.date_string_col
I am a little worried about such cases where people inadvertently do SELECT *  
on both sides of the Intersect for wide tables which ends up joining on lots of 
columns.  Ideally, we could only include the high NDV columns in the 'hash 
predicates'. The other join conditions could be part of 'other join 
predicates'.  This will avoid the potentially high CPU cost of hashing on lots 
of columns. Any thoughts ?


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@2
PS8, Line 2:  QUERY
Thanks for adding these tests ! Few  suggestions for the intersect/except 
tests: (in case these are covered somewhere, feel free to ignore):
 (a) could we add a  test with column name verification.  The naming should be 
from the leftmost branch. I think the non-nested case is sufficient.
 (b) one test where the column has different but compatible types, so there 
should be a CAST inserted for the join condition in the left-semi or left-anti 
join
 (c) a test with order-by, limit at the end of the 

[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-20 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(15 comments)

It's obviously a big change but I'm getting comfortable with it. I need to look 
at the e2e tests again. I had some pretty minor comments at this point.

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544
PS8, Line 2544: // We rely on the left precedence of KW_ORDER, KW_BY, and 
KW_LIMIT,
Not your change, but maybe drop a reference to IMPALA-4741 in here.


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@47
PS8, Line 47: union
set operation?


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@166
PS8, Line 166: DISTINCT
should this be union distinct ops?


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@449
PS8, Line 449:   public void rewriteExprs(ExprRewriter rewriter) throws 
AnalysisException {
This is pre-existing, but can you mention that distinctAggInfo_ doesn't need to 
be rewritten because the exprs are always simple SlotRefs? I had to read 
through the code to convince myself that we weren't missing something here.


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@511
PS8, Line 511: parenthesis
nit: parentheses


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@816
PS8, Line 816: kmaps
maps?


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@201
PS8, Line 201: SelectStmt uSelect = null, eiSelect = null;
Can you briefly comment on uSelect/eiSelect, since they're pretty central? The 
method comment explains it at a high level pretty well. Something like

  After each iteration of the loop below, exactly one of uSelect and eiSelect 
is non-null.
  If the last operand processed was a union, uSelect is the current select 
statement that has unionStmt nested inside, which in turn contains preceding 
union operands. If the last operator processed was an except or intersect, 
eiSelect is the current select statement containing preceding except or 
intersect operands in the from clause.


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60
PS8, Line 60: /*
I understand the frustration :)

We should uncomment this though...


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361
PS8, Line 361: # except nested, doesn't get unnest compare to plan above
I didn't quite see what this comment was getting at.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test
File testdata/workloads/functional-query/queries/QueryTest/except.test:

PS8:
Can we add a token query or two that use the MINUS and EXCEPT DISTINCT 
alternatives just to prove that they work end-to-end?


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153
PS8, Line 153: (select 10 except select 11) union all select 10
This is a repeat of the one just above.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166
PS8, Line 166: select 10 union all select 11 except select 10
Would be good to have something like

  select 10 union all select 11 union all select 11 except select 10

to show that it will deduplicate the union,


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356
PS8, Line 356: abosrt
absorb?


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File 

[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6640/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 18 Jul 2020 03:15:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6638/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 7
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 18 Jul 2020 03:02:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(8 comments)

> Patch Set 5:
>
> (6 comments)

http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/cup/sql-parser.cup@2493
PS3, Line 2493:   queryStmt = 
SetOperationStmt.createUnionOrSetOperation(operands, null, null);
> nit: consider moving this into SetOperationStmt.create() or similar.
Done


http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@498
PS3, Line 498: // reset() and re-analyzed. For a CTAS statement, the types 
represent column types
> Maybe convert this and the log on l489 to debug-level. This seems a bit spa
Yeah the slipped in, honestly it provides no value. There is a trace guarded 
log later with the rewritten result.


http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java:

http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@185
PS2, Line 185:   protected List widestExprs_ = new ArrayList<>();
> This is a little clunky but seems basically fine, I think only a couple of
It's super clunky. Any caller of rewriteQueryStatement needs to check if the 
result is a setoperationstmt and has a rewritten version.  Would be nice to 
refactor sometime later.


http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@300
PS2, Line 300:   op.getQueryStmt().setDoTableMasking(doTableMasking);
> Comment that this is because all other DISTINCT set operations are rewritte
Cleanup up the code to be more obvious.


http://gerrit.cloudera.org:8080/#/c/16123/2/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@604
PS2, Line 604:   }
> This only applies if all children are union, right?
Made this more clear with some comments.


http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
File fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java:

http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java@213
PS3, Line 213:   for (SetOperand o : other.unionAllOperands_) {
> Doesn't this re-order the operands? Is that correct?
You got it, this is not correct, lost it in the copy/paste from unionstmt. Good 
catch!


http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16123/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1746
PS3, Line 1746:* Returns a normalized version of a binary equality 
predicate 'expr' where the lhs
> Do we need to add any preconditions here and above to reject non-union oper
Let's let the type system handle this, made it into a UnionStmt.


http://gerrit.cloudera.org:8080/#/c/16123/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test:

http://gerrit.cloudera.org:8080/#/c/16123/4/testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test@14052
PS4, Line 14052: |--34:EXCHANGE [HASH(c_last_name,c_first_name,d_date)]
> Good it's likely from the nested SELECT * FROM that comes with the rewrite,
I think this just has something to do with ANTI JOIN planning. Might need a 
separate JIRA.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 18 Jul 2020 02:52:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014
PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from 
functional.alltypes intersect " +
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024
PS8, Line 3024:
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 18 Jul 2020 02:50:21 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Shant Hovsepian (Code Review)
Hello David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#8).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M testdata/workloads/tpch_nested/tpch_nested_core.csv
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
30 files changed, 4,906 insertions(+), 786 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/8
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014
PS7, Line 3014: AnalyzesOk("select rank() over (order by int_col) from 
functional.alltypes intersect " +
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/16123/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024
PS7, Line 3024:
line has trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 7
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 18 Jul 2020 02:34:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-17 Thread Shant Hovsepian (Code Review)
Hello David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#7).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M testdata/workloads/tpch_nested/tpch_nested_core.csv
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
30 files changed, 4,910 insertions(+), 790 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/7
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 7
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong