[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches

2020-07-22 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16196

to look at the new patch set (#6).

Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches
..

IMPALA-9882: Import KLL functionality from Apache DataSketches

First, I updated our existing snapshot of DataSketches to the
following commit:
c67d92faad3827932ca3b5d864222e64977f2c20
"Merge pull request #166 from gaborkaszab/const_cast"
This affects files originated from kll/ and common/ directories of
the DataSketches repo.

Then I copied all the files needed for KLL into our snapshot
directory.

You can find the original Apache DataSketches files here:
https://github.com/apache/incubator-datasketches-cpp

This new snapshot however, broke the interface we used for
serializing hll_union objects with dropping serialize_compact(). As a
solution I had to make changes to the serialization and merging
phases of the union operator by not serializing hll_union itself but
the underlying hll_sketch instead.

Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
D be/src/thirdparty/datasketches/CommonUtil.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable-internal.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
A be/src/thirdparty/datasketches/bounds_binomial_proportions.hpp
A be/src/thirdparty/datasketches/common_defs.hpp
A be/src/thirdparty/datasketches/count_zeros.hpp
M be/src/thirdparty/datasketches/hll.hpp
A be/src/thirdparty/datasketches/kll_helper.hpp
A be/src/thirdparty/datasketches/kll_helper_impl.hpp
A be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
A be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
A be/src/thirdparty/datasketches/kll_sketch.hpp
A be/src/thirdparty/datasketches/kll_sketch_impl.hpp
A be/src/thirdparty/datasketches/memory_operations.hpp
A be/src/thirdparty/datasketches/serde.hpp
29 files changed, 3,280 insertions(+), 347 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/16196/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943
Gerrit-Change-Number: 16196
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16123/9//COMMIT_MSG
Commit Message:

PS9:
note to self: need to focus on tests


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: |  hash predicates: bigint_col IS NOT DISTINCT FROM 
functional.alltypestiny.bigint_col, bool_col IS NOT DISTINCT FROM 
functional.alltypestiny.bool_col, double_col IS NOT DISTINCT FROM 
functional.alltypestiny.double_col, float_col IS NOT DISTINCT FROM 
functional.alltypestiny.float_col, id IS NOT DISTINCT FROM 
functional.alltypestiny.id, int_col IS NOT DISTINCT FROM 
functional.alltypestiny.int_col, month IS NOT DISTINCT FROM 
functional.alltypestiny.month, smallint_col IS NOT DISTINCT FROM 
functional.alltypestiny.smallint_col, timestamp_col IS NOT DISTINCT FROM 
functional.alltypestiny.timestamp_col, tinyint_col IS NOT DISTINCT FROM 
functional.alltypestiny.tinyint_col, year IS NOT DISTINCT FROM 
functional.alltypestiny.year, string_col IS NOT DISTINCT FROM 
functional.alltypestiny.string_col, date_string_col IS NOT DISTINCT FROM 
functional.alltypestiny.date_string_col
> Actually, I was not referring to planning time but the execution time.  I h
Yeah it does add overhead - with the regular equality predicates, we don't 
insert or probe with rows with null join keys, so the null check is omitted. In 
general it would be helpful to have more nullability info since there are a lot 
of null checks in the compiled code (basically every SlotRef expr)



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 8
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 06:27:51 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> From what I've seen the biggest killer in these situations is with plan tim
Actually, I was not referring to planning time but the execution time.  I 
haven't done a measurement but I would imagine the cpu cost of IS NOT DISTINCT 
to be a bit more than the equality comparison because of the null == null check 
for each row and potentially many columns. Something to evaluate in the future.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:41:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16212 )

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16212 )

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6170/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 3
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16212 )

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 3
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:37:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16212 )

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6698/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:05:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6697/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 05:03:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 9:

(15 comments)

Thanks for all the test suggestions guys!

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2544
PS8, Line 2544: // nonterminal making this issue unresolvable.  We rely on the 
left precedence of
> Not your change, but maybe drop a reference to IMPALA-4741 in here.
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/cup/sql-parser.cup@2546
PS8, Line 2546: // select_stmt (i.e., ORDER BY and LIMIT bind to the 
select_stmt by default, and not the
> Some of the wordings in this comment needs to be updated to remove referenc
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@348
PS8, Line 348: public boolean setOperationNeedsRewrite = false;
> It is confusing that this is specifically intended to be set for Except, In
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3014
PS8, Line 3014: AnalyzesOk("select rank() over (order by int_col) from 
functional.alltypes " +
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@3024
PS8, Line 3024:
> line has trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/16123/8/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@60
PS8, Line 60:
> Uncomment
Hah how'd that sneak in


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@361
PS8, Line 361: # nested except, shouldn't be unnested, if it had been the 
results would be incorrect
> I didn't quite see what this comment was getting at.
Hah who knows what my state of mind was at that point. I tried to clean up the 
comment a bit. The intent was to contrast this plan with the one above, to 
emphasize except can't be unnested and the difference plan shape as a result.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> That's good that the codegen does some optimization for the hashing+equalit
>From what I've seen the biggest killer in these situations is with plan times 
>dealing with ExprSubstitutionMaps being linear time searches. That combined 
>with the way rewrites and analysis are done, we end getting into super 
>quadratic behavior and JVM GC issues that could easily be avoid with a hash 
>table for exprs.

In general though agree, I had thought it would be better to address this issue 
and DISTINCT placement in general as another rewrite phase.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test
File testdata/workloads/functional-query/queries/QueryTest/except.test:

PS8:
> Can we add a token query or two that use the MINUS and EXCEPT DISTINCT alte
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@153
PS8, Line 153: (select 10 except select 11) union all select 10
> This is a repeat of the one just above.
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@166
PS8, Line 166: select 10 union all select 11 union all select 11 except select 
10
> Would be good to have something like
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/except.test@356
PS8, Line 356: b the
> absorb?
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

PS8:
> Can we add a token query or two that use the INTERSECT DISTINCT alternative
Done


http://gerrit.cloudera.org:8080/#/c/16123/8/

[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16212 )

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16212/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java:

http://gerrit.cloudera.org:8080/#/c/16212/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java@1392
PS1, Line 1392: Only subqueries that
> I think we should remove the bit about the invariant
Done



--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 04:37:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8125: Add query option to limit number of hdfs writer instances

2020-07-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16204 )

Change subject: IMPALA-8125: Add query option to limit number of hdfs writer 
instances
..


Patch Set 3: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16204/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16204/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@236
PS3, Line 236: to
nit: "to" seems misplaced.



--
To view, visit http://gerrit.cloudera.org:8080/16204
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17c8e61b9a32d908eec82c83618ff9caa41078a5
Gerrit-Change-Number: 16204
Gerrit-PatchSet: 3
Gerrit-Owner: Bikramjeet Vig 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 04:37:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9929: Subquery error should throw AnalysisException

2020-07-22 Thread Shant Hovsepian (Code Review)
Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16212

to look at the new patch set (#2).

Change subject: IMPALA-9929: Subquery error should throw AnalysisException
..

IMPALA-9929: Subquery error should throw AnalysisException

Unsupported subquery in the select list should throw an
AnalysisException.

Testing:
* Analyzer test to catch this case.

Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
---
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeSubqueriesTest.java
2 files changed, 8 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/16212/2
--
To view, visit http://gerrit.cloudera.org:8080/16212
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic299ea25fd6e505e364528891e737a9af5bcc338
Gerrit-Change-Number: 16212
Gerrit-PatchSet: 2
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-22 Thread Shant Hovsepian (Code Review)
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#9).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
29 files changed, 5,038 insertions(+), 796 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/9
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 9
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6696/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 04:12:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Aman Sinha (Code Review)
Hello David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16219

to look at the new patch set (#5).

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..

IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner.

Doing this pushdown can substantially improve performance
by applying the limit early.

Fixed couple of additional related issues uncovered as a
result of limit pushdown:
 - Changed the analytic sort's partition-by expr sort
   semantic from NULLS FIRST to NULLS LAST to ensure
   correctness in the presence of limit.
 - The LIMIT on the analytic sort node was causing it to
   be treated as a merging point in the distributed planner.
   Fixed it by introducing an api allowPartitioned() in the
   PlanNode.

Testing:
 - Ran PlannerTest and updated several EXPLAIN plans.
 - Added Planner tests for both positive and negative cases of
   limit pushdown.
 - Ran end-to-end TPC-DS queries. Specifically tested
   TPC-DS q67 for limit pushdown and result correctness.
 - TODO: Add targeted end-to-end tests

Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
24 files changed, 1,055 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/5
--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16230 )

Change subject: IMPALA-9987: Improve logging around HTTP connections
..


Patch Set 1:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6169/


--
To view, visit http://gerrit.cloudera.org:8080/16230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f
Gerrit-Change-Number: 16230
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 03:31:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..

IMPALA-9979: part 1: factor out Top-N heap.

This extracts the implementation of the actual
priority queue from the rest of TopNNode's state,
so that we can, in the next patch, have multiple
heaps per node.

The codegen'd InsertBatch() function is unfortunately
a little sensitive to minor changes in code, because
of the weird way that it does an indirect call via
TupleRowComparator - see IMPALA-4065. I had to
tweak the code a little to find a variant that performed
similarly to the previous version - other variants had
small regressions.

Perf:
Single node TPC-H showed no perf change.

The time for the TOP-N node in this targeted query was
within the margin of error:

use tpch30_parquet;
set mt_dop=1;
select l_extendedprice from lineitem
order by 1 limit 100

Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Reviewed-on: http://gerrit.cloudera.org:8080/16223
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/codegen/impala-ir.h
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/util/tuple-row-compare.h
5 files changed, 163 insertions(+), 73 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 03:30:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Riza Suminto (Code Review)
Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 20:

Patch set 19 fail the same test, test_multiple_sort_run_bytes_limits.
Looks like admission controller does not respect buffer_pool_limit as much as 
mem_limit.

Patch set 20 change the test cases to use mem_limit instead of 
buffer_pool_limit, just as Tim initially suggest. Some of the 
sort_run_bytes_limit parameter also adjusted to keep the assertions true.
Fang-Yu help me verify that this Patch set 20 can pass 
ubuntu-16.04-dockerised-tests by rerunning it in this jenkins job:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/2814/


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 02:47:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6695/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 01:11:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 23 Jul 2020 00:55:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@418
PS4, Line 418: if (!(analyticWindow_.getLeftBoundary().getType() == 
AnalyticWindow.BoundaryType.UNBOUNDED_PRECEDING
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@419
PS4, Line 419: && analyticWindow_.getRightBoundary().getType() == 
AnalyticWindow.BoundaryType.CURRENT_ROW)) {
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/4/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@417
PS4, Line 417:   private PlanNode findDescendantAnalyticNode(PlanNode root, 
List intermediateNodes) {
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 23 Jul 2020 00:51:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Aman Sinha (Code Review)
Hello David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16219

to look at the new patch set (#4).

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..

IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner.

Doing this pushdown can substantially improve performance
by applying the limit early.

Fixed couple of additional related issues uncovered as a
result of limit pushdown:
 - Changed the analytic sort's partition-by expr sort
   semantic from NULLS FIRST to NULLS LAST to ensure
   correctness in the presence of limit.
 - The LIMIT on the analytic sort node was causing it to
   be treated as a merging point in the distributed planner.
   Fixed it by introducing an api allowPartitioned() in the
   PlanNode.

Testing:
 - Ran PlannerTest and updated several EXPLAIN plans.
 - Added Planner tests for both positive and negative cases of
   limit pushdown.
 - Ran end-to-end TPC-DS queries. Specifically tested
   TPC-DS q67 for limit pushdown and result correctness.
 - TODO: Add targeted end-to-end tests

Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
24 files changed, 1,047 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/4
--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16231 )

Change subject: IMPALA-9977: Remove duplicate Ranger audit log entries for 
ALTER events
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6694/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16231
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a
Gerrit-Change-Number: 16231
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:51:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events

2020-07-22 Thread Fang-Yu Rao (Code Review)
Fang-Yu Rao has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16231 )

Change subject: IMPALA-9977: Remove duplicate Ranger audit log entries for 
ALTER events
..

IMPALA-9977: Remove duplicate Ranger audit log entries for ALTER events

This JIRA could be considered as a follow-up to IMPALA-9625, where we
converted the name of a TAccessEvent to lowercase to avoid duplicate
audits in the Set used to maintain the collected TAccessEvent's so that
there will not be duplicate TAccessEvent's in the file specified by the
flag of "-audit_event_log_dir" when Impala is started.

However, the patch for IMPALA-9625 only considered the audits that are
exported to the specific file mentioned above but not the
PrivilegeRequest's that will be processed by Ranger which in turn would
produce the corresponding audit log entries. Therefore, the
fully-qualified table name that is provided when
Analyzer#registerPrivReq() is called in Analyzer#getTable() is not
necessarily in lowercase, resulting in duplicate AuthzAuditEvent's
stored in the corresponding RangerBufferAuditHandler because the
full table names returned from registerAuthAndAuditEvent() and
getTable() differ. Refer to IMPALA-9625 for more details.

To resolve the inconsistencies, this patch converts the arguments of
database and table names to lowercase when
PrivilegeRequestBuilder#onTable() is building the corresponding
PrivilegeRequest, which will later be added to the Set of
PrivilegeRequest's for Ranger to process.

Testing:
- Added an FE test in RangerAuditLogTest.java to make sure no duplicate
  Ranger audit log entries are produced.
- Verified that the patch passes the exhaustive tests in the DEBUG
  build.

Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a
---
M fe/src/main/java/org/apache/impala/authorization/PrivilegeRequestBuilder.java
M 
fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
2 files changed, 18 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/16231/2
--
To view, visit http://gerrit.cloudera.org:8080/16231
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iab9b664ad5ee9722182007ee67d14bf47bd03d8a
Gerrit-Change-Number: 16231
Gerrit-PatchSet: 2
Gerrit-Owner: Fang-Yu Rao 
Gerrit-Reviewer: Fang-Yu Rao 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Norbert Luksa 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16218 )

Change subject: IMPALA-9799: Add retries to TestFetchFirst 
get_num_in_flight_queries calls
..

IMPALA-9799: Add retries to TestFetchFirst get_num_in_flight_queries calls

The calls to get_num_in_flight_queries in TestFetchFirst are flaky
because they expect the number of in flight queries to drop to 0
immediately. This might not always be true, especially in ASAN builds
where Impala is generally slower.

This patch wraps to call to get_num_in_flight_queries in
ImpalaTestSuite.assert_eventually, which adds retries to the calls to
get_num_in_flight_queries.

Testing:
* Ran tests/hs2/test_fetch_first.py locally

Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Reviewed-on: http://gerrit.cloudera.org:8080/16218
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M tests/hs2/test_fetch_first.py
1 file changed, 6 insertions(+), 2 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Gerrit-Change-Number: 16218
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16218 )

Change subject: IMPALA-9799: Add retries to TestFetchFirst 
get_num_in_flight_queries calls
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Gerrit-Change-Number: 16218
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:28:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16222 )

Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows 
are returned
..

IMPALA-9953: Shell should continue fetching even when 0 rows are returned

The Impala shell stops fetching rows if it receives a batch that
contains 0 rows. This is incorrect because a batch with 0 rows can be
returned if the fetch request hits a timeout. Instead, the shell should
rely on the value of has_rows / hasMoreRows to determine when to stop
issuing fetch requests.

Tests:
* Added a regression test to test_shell_commandline.py
* Ran all shell tests

Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Reviewed-on: http://gerrit.cloudera.org:8080/16222
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M shell/impala_shell.py
M tests/shell/test_shell_commandline.py
2 files changed, 17 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Gerrit-Change-Number: 16222
Gerrit-PatchSet: 4
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16222 )

Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows 
are returned
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Gerrit-Change-Number: 16222
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:28:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6693/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:09:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 20:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6692/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 23:01:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-22 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16159 )

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..


Patch Set 4:

(13 comments)

Thanks for the review! Uploaded the new patch set after it passed the 
exhaustive test.

> I think it would be useful if we could have an exhaustive test (may be in a 
> separate jira) to make sure that we are not leaking partitions in statestore. 
> The test could add/drop partitions along with multiple add/invalidate/drop 
> table commands and make sure that the number of partition keys in the 
> statestore is as per our expectation.

Yeah, created IMPALA-9994 for this.

http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@712
PS3, Line 712:   if 
(!FeSupport.NativeAddPendingTopicItem(nativeCatalogServerPtr, v2Key,
> Its unclear to me that when we generate the minimalObject when delete flag
Sorry, this line is added in PS1 and should be removed in PS2... I add a test 
for this in PS4.

Added these nice comments in the class comment of HdfsTable.


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@742
PS3, Line 742:   partObject.setId(obj.hdfs_partition.id);
 : } else if (obj.hdfs_partition.isSetPrev_id()) {
 :   Preconditions.checkState(
 :   obj.hdfs_partition.prev_id != 
HdfsPartition.INITIAL_PARTITION_ID - 1,
 :   "Invalid partition id");
 :
> This looks a bit hacky to me. Do you think it would be more readable by add
I think this way satifies the meaning of invalidations better. LocalCatalog 
coordinators don't need to distinguish whether an invalidation is an "update" 
invalidation or a "delete" invalidation. On the other hand, catalogd sends 
minimal objects as invalidations because it knows the implementation of 
coordinators. I think it's ok for adding the awareness of how coordinator use 
the partition ids.

BTW, the prev_id field is added in THdfsPartition but is only used in passing 
the previous partition id through here. I'll define its default value to -1 in 
thrift definition.


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1295
PS3, Line 1295: topicUpdateEntry.getLastSentVersion(),
> wouldn't this line be called for both fullUpdate and a incremental update?
Sorry, I think I use "incremental updates" in many places and it introduce 
confusions. toThriftWithPartitionIds() is used when catalogd wants to send 
partition updates individually instead of carrying them inside the thrift 
table. I call this "incremental updates" but I think I should avoid the 
conflicts with incremental catalog topic updates. Will update the javadoc.


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1321
PS3, Line 1321: // statestored restarts).
  : if (ctx.isFullUpdate()) hdfsTable.resetMaxSentPartitionId();
  :
> nit, perhaps this is more readable?
Done


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1329
PS3, Line 1329:
> Can you clarify why this is needed only in case of incremental updates? Wha
Sorry, I thought no one can't make use of these in a full topic update. But 
it's only true for statetore and v1 coordinators.

When statestore restarts, its catalog topic map is empty. It will fetch a full 
topic update (fromVersion=0) from catalogd. But there are no old values to be 
reset in its catalog topic map.
When statestore restarts, V1 coordinators will receive a full topic update 
which will trigger it to reset the whole local cache. They don't need deletions 
in the new empty cache. Actually, partition deletions are always ignored by v1 
coordinators since partition deletions are detected by absense of the id in 
table's latest partition list.
However, v2 coordinators won't reset the cache so they can still use them to 
invalidate obsolete partition cache. Will remove this check.


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@90
PS3, Line 90: ly instead of
> I think it is worth documenting that even though this extends CatalogObject
Sure. Done.


http://gerrit.cloudera.org:8080/#/c/16159/3/fe/src/main/java/org/apache/impala/catal

[Impala-ASF-CR] IMPALA-3127: Support incremental metadata updates in partition level

2020-07-22 Thread Quanlong Huang (Code Review)
Hello Anurag Mantripragada, Vihang Karajgaonkar, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16159

to look at the new patch set (#4).

Change subject: IMPALA-3127: Support incremental metadata updates in partition 
level
..

IMPALA-3127: Support incremental metadata updates in partition level

Currently, partitions are tightly integrated into the HdfsTable objects.
Catalogd has to transmit the entire table metadata even when few
partitions change. This is a waste of resources and can lead to OOM in
transmitting large tables due to the 2GB JVM array limit.

This patch makes HdfsPartition extend CatalogObject so the catalogd can
send partitions as individual catalog objects. Consequently, table
objects in the catalog topic update can have minimal partition maps that
only contain the partition ids, which reduces the thrift object size for
large tables. The catalog object key of HdfsPartition consists of db
name, table name and partition name.

In "full" topic mode (catalog_topic_mode=full), catalogd only sends
changed partitions with their latest table states. The latest table
states are table objects with the minimal partition map. Legacy
coordinators use the partition list to pick up existing (unchanged)
partitions from the existing table object and new partitions in the
catalog update.

Currently, partition instances are immutable - all partition
modifications are implemented by deleting the old instance and adding a
new one with a new partition id. Since partition ids are generated by a
global counter. Newer partition instances will have larger partition
ids. So catalogd maintains a watermark for each table as the max sent
partition id. Partition instances with ids larger than this are new
partitions that should be sent in the next catalog update. For the
deleted partition instances, they are kept in a set for each table until
the next catalog update. If there are no updates on the same partition
name, catalogd will send deletion on the partition.

For dropped or invalidated tables, catalogd will still send deletions on
their partitions. Although they are not used in coordinators
(coordinators delete the partitions when they delete the table
instances), they help in avoiding topic entry leak in the statestore
catalog topic.

In "minimal" topic mode (catalog_topic_mode=minimal), catalogd only
sends invalidations on tables and stale partition instances. Each
partition instance is identified by its partition id. LocalCatalog
coordinators use the partition invalidations to evict stale partitions
in time. For instance, let's say partition(year=2010) is updated in
catalogd. This is done by deleting the old partition instance
partition(id=0, year=2010) and adding a new partition instance
partition(id=1, year=2010). Catalogd will send invalidations on the
table and partition instance with id=0, but not the one with id=1. A
LocalCatalog coordinator will invalidate the partition instance(id=0) if
it's in the cache. If the partition instance(id=1) is cached, it's
already the latest version since partition instances are immutable. So
we don't need to invalidate it.

Tests
 - Run exhaustive tests.
 - Run exhaustive test_ddl.py in LocalCatalog mode.
 - Add test in test_local_catalog.py to verify stale partitions are
   invalidated in LocalCatalog when partitions are updated.

Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
---
M be/src/catalog/catalog-util.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogObject.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ImpaladCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_local_catalog.py
13 files changed, 615 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16159/4
--
To view, visit http://gerrit.cloudera.org:8080/16159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia0abfb346903d6e7cdc603af91c2b8937d24d870
Gerrit-Change-Number: 16159
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anurag Mantripragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8547: get json object fails to get value for numeric key

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14905 )

Change subject: IMPALA-8547: get_json_object fails to get value for numeric key
..


Patch Set 3:

> This patch LGTM.
 >
 > Hive supports more general keys because it just split the json path
 > by '.' 
 > https://github.com/apache/hive/blob/ba0217ff17501fb849d8999e808d37579db7b4f1/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java#L147
 > So this is also workable in Hive:
 >
 > select get_json_object('{"hello world": 5}', '$.hello world');
 >
 > It can't work in Impala because "hello world" is not a legal
 > variable name.
 > I think if we want the compatibility with Hive we can create a JIRA
 > to refactor the json patch parsing logics.

I filed IMPALA-9993 as a follow up. I think this requires some more thought. 
That SQL statement is valid in Postgres, but not MySQL. It seems all databases 
have a slightly different way of handling JSON. The Hive / Impala syntax seems 
to be some combination of Postgres / MySQL behavior, which is a bit odd.


--
To view, visit http://gerrit.cloudera.org:8080/14905
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7df037ccf2c79da0ba86a46df1dd28ab0e9a45f4
Gerrit-Change-Number: 14905
Gerrit-PatchSet: 3
Gerrit-Owner: Eugene Zimichev 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:47:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Riza Suminto (Code Review)
Hello David Rorke, Tim Armstrong, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15963

to look at the new patch set (#20).

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..

IMPALA-6692: Trigger sort node run before hitting memory limit.

Sorter node works by adding row batches to a sort run. After all
batches are added to current unsorted run or memory limit is hit,
sorter will immediately start the run. If the latter case happens,
sorter will spill the sorted run to disk after sort complete, create
new unsorted run object, and continue to add the next row batches, and
so on.

This algorithm tries to fit as much rows into memory before start
sorting. However, in the case of partitioned sort with large number of
row batches, fitting too much rows into memory will cause the sort to
be slow and block the sorter node for a long time before it can
release some memory and continue accepting the next row batch from
exchange node. One slow sorter node can block exchange node from
sending row batches to other sorter node that is free.

This patch speeds up the decision to start the sort without waiting it
to hit memory limit first by capping the intermediary quicksort run to
lower memory limit, determined by query option 'sort_run_bytes_limit'.
If the total used reservation of quicksort has exceeded
sort_run_bytes_limit, current unsorted_run_ will be wrapped up,
sorted, and then spilled. Thus, overlapping the next sort run with
spill from previous sort run.

To reduce regression for cases where total input size of sort node
might be fully fit into available memory, sort_run_bytes_limit will
not be enforced for the first sort run. However, it will stay limited
by sort_run_bytes_limit if planner estimates hint that spill is
inevitably will happen.

We also add new summary counter 'AddBatchTime' to get summary of how
much time spent in Sorter::AddBatch. Max of 'AddBatchTime' indicate
the longest time spent in Sorter::AddBatch, presumably busy doing
intermediary sort.

Testing:
- Add new e2e test TestQueryFullSort::test_multiple_sort_run_bytes_limits
- Run core tests
- Run data loading of 3 largest TPC-DS facts table of 300GB scale into
  real cluster using 5 backends, and 4GB mem_limit.
  sort_run_bytes_limit is varied between unspecified (not limited) vs
  512 MB. The performance result is summarized in the following table.

+---+-+--+---+-+
|  Insert table |  #Rows  |  Avg |   no limit|  512 MB 
limit   |
|   | | SortDataSize 
++--+-+---+
|   | |   per Node   |  Query |  Max |  Query  |
  Max  |
|   | |  |  Time  | AddBatchTime |   Time  |  
AddBatchTime |
+---+-+--++--+-+---+
| store_sales   | 864.00M | 15.29 GB | 30m18s | 53s311ms | 20m |
   5s634ms |
+---+-+--++--+-+---+
| catalog_sales | 431.97M | 11.34 GB | 23m24s | 31s212ms |  15m27s |
   3s603ms |
+---+-+--++--+-+---+
| web_sales | 216.01M |  5.67 GB |  8m16s | 29s250ms |   6m41s |
   3s856ms |
+---+-+--++--+-+---+

Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
---
M be/src/exec/sort-node.cc
M be/src/exec/sort-node.h
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/sorter.cc
M be/src/runtime/sorter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M tests/query_test/test_sort.py
15 files changed, 224 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/15963/20
--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 20
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16230 )

Change subject: IMPALA-9987: Improve logging around HTTP connections
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6169/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f
Gerrit-Change-Number: 16230
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:26:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections

2020-07-22 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16230 )

Change subject: IMPALA-9987: Improve logging around HTTP connections
..


Patch Set 1:

verify failed due to IMPALA-9923


--
To view, visit http://gerrit.cloudera.org:8080/16230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f
Gerrit-Change-Number: 16230
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:25:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc@37
PS3, Line 37: priority_queue_.size() < heap_capacity()
> just thinking out loud, do you think generally the else part will be more c
The branch should be predictable at least - you're right that we'd want to 
optimise for the case when there are many rows.

Probably not worth investing too much into tuning until we do codegen of the 
comparator, cause that will completely change the performance profile anyway.



--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6168/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 22:18:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 1: factor out Top-N heap.

2020-07-22 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16223 )

Change subject: IMPALA-9979: part 1: factor out Top-N heap.
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc
File be/src/exec/topn-node-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16223/3/be/src/exec/topn-node-ir.cc@37
PS3, Line 37: priority_queue_.size() < heap_capacity()
just thinking out loud, do you think generally the else part will be more 
common? Like I would assume the limit to be a smallish value and the top-N node 
going through 1000s of rows. If yes, then do you think adding a IR_LIKELY for 
the else case will help performance even if in a small way?



--
To view, visit http://gerrit.cloudera.org:8080/16223
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1f585216b547af7a470e02f75458b1901dc44a31
Gerrit-Change-Number: 16223
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Jul 2020 21:51:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16230 )

Change subject: IMPALA-9987: Improve logging around HTTP connections
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6164/


--
To view, visit http://gerrit.cloudera.org:8080/16230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f
Gerrit-Change-Number: 16230
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 20:59:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6167/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 19:41:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6691/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 19:40:46 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 19:34:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Grant Henke (Code Review)
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16120

to look at the new patch set (#4).

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..

IMPALA-9903: Reduce Kudu openTable calls per query

This patch reduces the number of Kudu openTable calls for the
lifetime of a query by storing the KuduTable object in the
Analyzer GlobalState and using it in the KuduScanNode.

It does not cache the KuduTable object longer than a single
query, does not impact DDL statements, and does not
introduce the need to invalidate metadata when interacting with
Kudu tables.

Reducing the number of openTable calls is important because each
call results in a GetTableSchema RPC to the remote leader Kudu
master. With very high rates of queries against Kudu tables this
can overload the master leading to degraded query performance.

Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
3 files changed, 34 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/4
--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16215 )

Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6690/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Gerrit-Change-Number: 16215
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 22 Jul 2020 19:15:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 3:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6689/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:56:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators

2020-07-22 Thread Wenzhe Zhou (Code Review)
Wenzhe Zhou has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16215 )

Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators
..

IMPALA-5746: Cancel all queries scheduled by failed coordinators

Executor registers the updating of cluster membership. When coordinators
are absence from the active cluster membership list, executer cancels
all the running fragments of the queries which are scheduled by the
inactive coordinator since the executer cannot send results back to
the inactive/failed coordinators. This makes executers quickly release
the resources allocated for those running fragments to be canceled.

Testing:
- Added new test case TestProcessFailures::test_kill_coordinator
  and ran the test case as following command:
./bin/impala-py.test tests/custom_cluster/test_process_failures.py\
  ::TestProcessFailures::test_kill_coordinator \
  --exploration_strategy=exhaustive.
- Passed the core test.

Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
---
M be/src/runtime/coordinator-backend-state.cc
M be/src/runtime/exec-env.cc
M be/src/runtime/query-exec-mgr.cc
M be/src/runtime/query-exec-mgr.h
M be/src/runtime/query-state.cc
M be/src/runtime/query-state.h
M be/src/runtime/test-env.cc
M common/protobuf/control_service.proto
M tests/custom_cluster/test_process_failures.py
9 files changed, 183 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/15/16215/4
--
To view, visit http://gerrit.cloudera.org:8080/16215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Gerrit-Change-Number: 16215
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-5746: Cancel all queries scheduled by failed coordinators

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16215 )

Change subject: IMPALA-5746: Cancel all queries scheduled by failed coordinators
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc
File be/src/runtime/exec-env.cc:

http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/exec-env.cc@554
PS3, Line 554:   
server->CancelQueriesOnFailedBackends(current_backend_set);
> I was thinking to reuse the backend set and save a loop with one callback f
Yeah +1 to what Thomas said.


http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc
File be/src/runtime/query-exec-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/16215/3/be/src/runtime/query-exec-mgr.cc@222
PS3, Line 222:   // TODO: create cancellation task queue and working thread to 
run cancellation tasks
 :   // on a separate thread. If the queue is full, ignore the 
cancellations since we'll
 :   // be able to process them on the next heartbeat instead.
 :
 :   for (auto& qs : to_cancel) {
 : VLOG(1) << "CancelQueriesForFailedCoordinators(): cancel 
query " << qs->query_id();
 : qs->Cancel();
 : qs->is_coord_active_.Store(false);
 : ReleaseQueryState(qs);
 :   }
> Will define a new thread pool owned by QueryExecMgr.
Yeah separate thread pool seems fine. Yeah, I'm fine with keeping this out of 
ImpalaServer.



--
To view, visit http://gerrit.cloudera.org:8080/16215
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I918fcc27649d5d2bbe8b6ef47fbd9810ae5f57bd
Gerrit-Change-Number: 16215
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:38:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16120/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16120/2//COMMIT_MSG@9
PS2, Line 9: This patch reduces the number of Kudu openTable calls for the
   : lifetime of a query by storing the KuduTable object in the
   : Analyzer GlobalState and using it in the
> I think it would be good to be more specific here. Looks like currently we 
Done


http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java
File fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java:

http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java@166
PS2, Line 166:   result.setSchema(resultSchema);
> These are methods that implement the show partitions DDL, so we don't need
Done


http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java
File fe/src/main/java/org/apache/impala/catalog/KuduTable.java:

http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@185
PS2, Line 185:   @Override
 :   public List getPrimaryKeyColumnNames() {
 : return ImmutableList.copyOf(primaryKeyColumnNames_);
 :   }
 :
> This would mean that once kuduTable_ is initialized, it never gets refreshe
Done


http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@298
PS2, Line 298: partitionBy_ = Utils.loadPartitionByParams(kuduTable);
> This probably should be kept as is otherwise we won't see a updated Kudu sc
Done


http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java:

http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java@56
PS2, Line 56:   /**
> Caching it in LocalTable makes sense since it's per-query anyway. So this p
If we are going the analyzer route I don't think this is needed right?


http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16120/2/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@135
PS2, Line 135:   // Get the KuduTable from the analyzer to retrieve the 
cached KuduTable
> I think this invocation should go via 'analyzer' to retrieve the per-query
Done



--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:39:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-22 Thread Grant Henke (Code Review)
Hello Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16120

to look at the new patch set (#3).

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..

IMPALA-9903: Reduce Kudu openTable calls per query

This patch reduces the number of Kudu openTable calls for the
lifetime of a query by storing the KuduTable object in the
Analyzer GlobalState and using it in the KuduScanNode.

It does not cache the KuduTable object longer than a single
query, does not impact DDL statements, and does not
introduce the need to invalidate metadata when interacting with
Kudu tables.

Reducing the number of openTable calls is important because each
call results in a GetTableSchema RPC to the remote leader Kudu
master. With very high rates of queries against Kudu tables this
can overload the master leading to degraded query performance.

Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
3 files changed, 34 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/3
--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16218 )

Change subject: IMPALA-9799: Add retries to TestFetchFirst 
get_num_in_flight_queries calls
..


Patch Set 2:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6166/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Gerrit-Change-Number: 16218
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:20:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has removed a vote on this change.

Change subject: IMPALA-9799: Add retries to TestFetchFirst 
get_num_in_flight_queries calls
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/16218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Gerrit-Change-Number: 16218
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9799: Add retries to TestFetchFirst get num in flight queries calls

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16218 )

Change subject: IMPALA-9799: Add retries to TestFetchFirst 
get_num_in_flight_queries calls
..


Patch Set 2:

Failed due to IMPALA-9991.


-- 
To view, visit http://gerrit.cloudera.org:8080/16218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I349f861e8219e62311e8d4e0bfbd8f3618f0fa46
Gerrit-Change-Number: 16218
Gerrit-PatchSet: 2
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:19:45 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16222 )

Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows 
are returned
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6165/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Gerrit-Change-Number: 16222
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:14:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16222 )

Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows 
are returned
..


Patch Set 3:

A bunch of HBase tests failed due to connection timeouts to the region servers.


--
To view, visit http://gerrit.cloudera.org:8080/16222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Gerrit-Change-Number: 16222
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 18:14:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9953: Shell should continue fetching even when 0 rows are returned

2020-07-22 Thread Sahil Takiar (Code Review)
Sahil Takiar has removed a vote on this change.

Change subject: IMPALA-9953: Shell should continue fetching even when 0 rows 
are returned
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/16222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I5f8527aea9e433f8cf426435c0ba41355bbf9d88
Gerrit-Change-Number: 16222
Gerrit-PatchSet: 3
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)

2020-07-22 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16228 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified 
tables (complex types)
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16228/3/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1508
PS3, Line 1508:*   SELECT item FROM complextypestbl $a$1, $a$1.int_array;
I need to understand the current complex types support (independent of ACID) a 
little more but my initial thought here is that this could potentially 
introduce a lot of cross-joins depending on the query that would make the ACID 
reads slower than the regular reads.



--
To view, visit http://gerrit.cloudera.org:8080/16228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f
Gerrit-Change-Number: 16228
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 16:47:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9987: Improve logging around HTTP connections

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16230 )

Change subject: IMPALA-9987: Improve logging around HTTP connections
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6164/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16230
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38a32b8746084ea44b098a6ccce4ce01947ae88f
Gerrit-Change-Number: 16230
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Tauber-Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 16:44:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified tables (complex types)

2020-07-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16228 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 2 Reading modified 
tables (complex types)
..


Patch Set 2: Code-Review+1

(5 comments)

Nice Work! I did a readthrough on the code part, haven't checked the tests. 
Looks fine for me, but someone with more frontend knowledge should also take a 
look.

http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@385
PS2, Line 385: reqires
nit: typo


http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1516
PS2, Line 1516: for (int i = 0; i < stmt.fromClause_.size(); ++i) {
  : TableRef tblRef = stmt.fromClause_.get(i);
nit: you can iterate over fromClause_.getTableRefs() and then you can use a 
foreach and could get rid of L1517.


http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1541
PS2, Line 1541: int tableRefIdx
Instead of the index you can use the CollectionTableRef itself as a param.

Update: I see you use 'tableRefIdx' for other purposes below so I guess my 
comment here doesn't make sense :)


http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1556
PS2, Line 1556: newCollPath.remove(0);
Could you add a comment what is at position '0' here? (I guess in L1553 it's 
the DB name, but we removed it)


http://gerrit.cloudera.org:8080/#/c/16228/2/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@1576
PS2, Line 1576: private TableRef newTableRef(Analyzer analyzer, 
List rawPath, String alias)
Shouldn't this function belong to TableRef as a static member function?



--
To view, visit http://gerrit.cloudera.org:8080/16228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b2c6cd3d87c452c5b96a913b14c90ada78d4c6f
Gerrit-Change-Number: 16228
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 15:14:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6692: Trigger sort node run before hitting memory limit.

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15963 )

Change subject: IMPALA-6692: Trigger sort node run before hitting memory limit.
..


Patch Set 19:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6163/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/15963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2a0ba7c4bae4f1d300d4d9d7f594f63ced06a240
Gerrit-Change-Number: 15963
Gerrit-PatchSet: 19
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 14:29:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16196 )

Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6688/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943
Gerrit-Change-Number: 16196
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 22 Jul 2020 13:44:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9882: Import KLL functionality from Apache DataSketches

2020-07-22 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16196

to look at the new patch set (#5).

Change subject: IMPALA-9882: Import KLL functionality from Apache DataSketches
..

IMPALA-9882: Import KLL functionality from Apache DataSketches

First, I updated our existing snapshot of DataSketches to the
following commit:
dddc149209902f72b71109f1a098e58d6d4761ee
"Merge pull request #159 from apache/workflow_update"
This affects files originated from hll/ and common/ directories of
the DataSketches repo.

Then I copied all the files needed for KLL into our snapshot
directory.

You can find the original Apache DataSketches files here:
https://github.com/apache/incubator-datasketches-cpp

This new snapshot however, broke the interface we used for
serializing hll_union objects with dropping serialize_compact(). As a
solution I had to make changes to the serialization and merging
phases of the union operator by not serializing hll_union itself but
the underlying hll_sketch instead.

Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/AuxHashMap-internal.hpp
D be/src/thirdparty/datasketches/CommonUtil.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable-internal.hpp
M be/src/thirdparty/datasketches/CompositeInterpolationXTable.hpp
M be/src/thirdparty/datasketches/CouponHashSet-internal.hpp
M be/src/thirdparty/datasketches/CouponList-internal.hpp
M be/src/thirdparty/datasketches/Hll4Array-internal.hpp
M be/src/thirdparty/datasketches/HllArray-internal.hpp
M be/src/thirdparty/datasketches/HllSketch-internal.hpp
M be/src/thirdparty/datasketches/HllSketchImplFactory.hpp
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/HllUtil.hpp
M be/src/thirdparty/datasketches/MurmurHash3.h
M be/src/thirdparty/datasketches/README.md
A be/src/thirdparty/datasketches/bounds_binomial_proportions.hpp
A be/src/thirdparty/datasketches/common_defs.hpp
A be/src/thirdparty/datasketches/count_zeros.hpp
M be/src/thirdparty/datasketches/hll.hpp
A be/src/thirdparty/datasketches/kll_helper.hpp
A be/src/thirdparty/datasketches/kll_helper_impl.hpp
A be/src/thirdparty/datasketches/kll_quantile_calculator.hpp
A be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
A be/src/thirdparty/datasketches/kll_sketch.hpp
A be/src/thirdparty/datasketches/kll_sketch_impl.hpp
A be/src/thirdparty/datasketches/memory_operations.hpp
A be/src/thirdparty/datasketches/serde.hpp
29 files changed, 3,280 insertions(+), 347 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/16196/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16196
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I848488d5145c808109bd50aecfbf3ef83f981943
Gerrit-Change-Number: 16196
Gerrit-PatchSet: 5
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6687/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 07:42:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@413
PS2, Line 413: if (!(analyticWindow_.getLeftBoundary().getType() == 
AnalyticWindow.BoundaryType.UNBOUNDED_PRECEDING
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@414
PS2, Line 414: && analyticWindow_.getRightBoundary().getType() == 
AnalyticWindow.BoundaryType.CURRENT_ROW)) {
line too long (106 > 90)


http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@414
PS2, Line 414:   private PlanNode findDescendantAnalyticNode(PlanNode root, 
List intermediateNodes) {
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 22 Jul 2020 07:14:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

2020-07-22 Thread Aman Sinha (Code Review)
Hello David Rorke, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16219

to look at the new patch set (#2).

Change subject: IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator
..

IMPALA-9983 : [WIP] Pushdown limit to analytic sort operator

This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner.

Doing this pushdown can substantially improve performance
by applying the limit early.

Fixed couple of additional related issues uncovered as a
result of limit pushdown:
 - Changed the analytic sort's partition-by expr sort
   semantic from NULLS FIRST to NULLS LAST to ensure
   correctness in the presence of limit.
 - The LIMIT on the analytic sort node was causing it to
   be treated as a merging point in the distributed planner.
   Fixed it by introducing an api allowPartitioned() in the
   PlanNode.

Testing:
 - Ran PlannerTest and updated several EXPLAIN plans
 - Ran end-to-end TPC-DS queries
 - Specifically tested TPC-DS q67 for limit pushdown and
   result correctness
 - Manually tested several negative cases where the
   pushdown should not be applied
 - TODO: Run more end-to-end tests
 - TODO: Add unit tests

Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
21 files changed, 445 insertions(+), 265 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/2
--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Tim Armstrong