[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 05:14:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator

2020-07-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@421
PS11, Line 421:   pbExprs.size() > sortExprs.size()) return false;
nit: we'd usually enclose the body with braces for multi-line statements.


http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@1064
PS11, Line 1064:   analyticNode.removeChild(sortNode);
nit: setChild(0, upperTopN) is slightly more concise than the remove/add pair


http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/SortNode.java@143
PS11, Line 143: partitioningExprs_ = partitioningExprs;
I think also need to call computeStats() again so that the limit can be 
factored into the estimates. I think the other state set in init() doesn't need 
to change.


http://gerrit.cloudera.org:8080/#/c/16219/11/testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test
File 
testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test:

PS11:
We should move this to being invoked from a python test class for the workload 
TPC-H, and also remove the tpch. prefixes for the tables - that way the test 
parameterisation on file formats will work correctly.

Currently it will be run redundantly on multiple functional file formats.


http://gerrit.cloudera.org:8080/#/c/16219/11/testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test@54
PS11, Line 54:  RESULTS
Spoke directly, but just leaving a comment to mention that the result sets are 
empty.



--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 11
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 04:23:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16264 )

Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
..


Patch Set 1:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6746/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495
Gerrit-Change-Number: 16264
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Jul 2020 02:30:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6745/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 11
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 02:22:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator

2020-07-30 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator
..


Patch Set 11:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@374
PS10, Line 374:* @param sortInfo The sort info from the outer sort node
> nit: remove empty @param annotations?
Added the descriptions.


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@422
PS10, Line 422:
> I think we should be able to use expression equality, since the partition a
Good point about using the substitutedPartitionExprs_.  When I used it, it was 
still not able to do the equivalence comparison.  After working through various 
expr mappings, I realized that it is a little late to use the sortInfo's 
sortExprs since they have already been substituted.  I added a new field in 
SortInfo to keep the original sort exprs and after substituting those, was able 
to do the comparison.


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@431
PS10, Line 431: !(pbExpr instanceof SlotRef && so
> I wonder if this check is necessary. If the sort order is descending, then
This check is needed because the partition-by exprs are always ASC order.


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@481
PS10, Line 481:  lhs).getDe
> Not sure the rational behind it.
I added a comment explaining this.


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@487
PS10, Line 487:get(0)).getFnC
> Not follow.
Same as above.  It has to do with containment within the limit values.


http://gerrit.cloudera.org:8080/#/c/16219/6/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/6/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@266
PS6, Line 266: return createSortInfo(input, sortExprs, isAsc, nullsFirst, 
TSortingOrder.LEXICAL);
> I will revert this particular change in the next patch since this is not ne
Done


http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@391
PS8, Line 391: // so limit pushdown is not applicable
> I think we can generate an analytic without a sort in some edge cases, e.g.
Yes, this was a bug.  Fixed it.


http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@427
PS8, Line 427: }
> We should maybe also avoid going into Subplans? I guess it doesn't really m
Added a check for Subplan and actually restricted the tree-walk to only single 
input operators.


http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@434
PS8, Line 434:   root.getChildren().size() > 1) {
> This looks like it just goes down the left branch of the plan tree - is tha
Yeah, I was thinking of the narrow use case where it goes left deep on single 
input operators..but yeah this is confusing. I rewrote it and simplified to 
only allow single child.


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@343
PS10, Line 343: offset == 0
> Thought that offset is non-negative in SQL.
Done


http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SortNode.java
File fe/src/main/java/org/apache/impala/planner/SortNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SortNode.java@143
PS10, Line 143: partitioningExprs_ = partitioningExprs;
> If this is a TopN sort, then the method should succeed if both the limit an
Yeah, in theory one can call convertToTopN on a node that is already TopN ,but 
for this patch I would prefer to restrict it to a narrower use case.



--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284

[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1

2020-07-30 Thread Yida Wu (Code Review)
Yida Wu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16264


Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
..

WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1

WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Major Features:
1) Local files as buffers for spilling to S3.
2) Async Upload and Sync Fetching of remote files.
3) Sync remote files deletion after query ends.
4) Local buffer files management.
5) Compatibility of spilling to local and remote.
6) All the errors from hdfs/s3 should terminate the query.

Implementation Details:
1) An new enum type is added to specify the function of local
   files. LocalFileMode::BUFFER and LocalFileMode::FILE.
   LocalFileMode::BUFFER indicates that the local file is used as
   a buffer for remote operations. LocalFileMode::FILE indicates
   the local file is used for spilling to local.
   Also, startup option "remote_tmp_file_local_buff_mode" is added
   to specify the implementation of the reading pages from the remote.
   If set to true, the whole file would be fetched to the local
   buffer during reading. If set to false, only a page is read for
   each reading.
2) Two disk queues have been added to do the file operation jobs.
   Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper
   File operations on the remote disk like upload and fetch should
   be done in these queues. The purpose of the queues is to seperate
   long run operations with short ones, and also to have a more
   accurate control on the thread number working on these file
   operation jobs, sometimes we might don't want too many upload and
   fetch jobs working in the same time.
   RemoteOperRange is the new type to carry the file operation jobs.
   Previously,we have request types of READ and WRITE.
   Now FETCH/UPLOAD/EVICT have been added.
3) The tmp files are deleted when the tmp file group is
   deconstructing.
4) The local buffer files management is to control the total size
   of local buffer files and evict files if needed. There are
   basically five status of a remote tmp file,
   IN_WRITING/DUMPED/IN_DUMPING/REMOTE/TO_DELETE. A local buffer
   file can be evicted only if it is in status REMOTE. An EVICT job
   is sent to the local disk queue if a file is decided to be evicted.
   There are two modes to decide the sequence of choosing files to be
   evicted. Default is LIFO, the other is FIFO. It can be decided by
   startup option "remote_tmp_files_avail_pool_lifo".
5) Spilling to local has higher priority than spilling to remote.
   If no local scratch space is available, temporary data will be
   spilled to remote.
   Remote scratch space uses the highest priority local scratch dir
   as its buffer. If no local scratch space or only one has been
   configured, a default local buffer should be used.
   The purpose of the design is to simplify the implementation in
   milestone 1 with less changes on the configuration.

Limitations:
* Only one remote scratch dir is supported.
* The highest priority local scratch dir is used for the buffer of
  remote scratch space if remote scratch dir exists.

TODO:
  - Testcases
  - Refine the naming of the remote scratch dir and files.
  - Upper and lower bounds of new options related to size.
  - More accurate error codes and error handling.
  - Preserve memory buffer for block buffers on file upload and fetch.
  - Jobs cancelling for new disk queues.
  - Some metrics might need to be added.
  - Efficiency issue when mixing local and remote scratch space.

Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495
---
M be/src/runtime/hdfs-fs-cache.cc
M be/src/runtime/io/CMakeLists.txt
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/disk-io-mgr.h
A be/src/runtime/io/file-writer.h
M be/src/runtime/io/hdfs-file-reader.cc
A be/src/runtime/io/hdfs-file-writer.cc
A be/src/runtime/io/hdfs-file-writer.h
M be/src/runtime/io/local-file-system.cc
M be/src/runtime/io/local-file-system.h
A be/src/runtime/io/local-file-writer.cc
A be/src/runtime/io/local-file-writer.h
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/runtime/io/request-ranges.h
M be/src/runtime/io/scan-range.cc
M be/src/runtime/tmp-file-mgr-internal.h
M be/src/runtime/tmp-file-mgr.cc
M be/src/runtime/tmp-file-mgr.h
M be/src/util/hdfs-util.cc
M be/src/util/hdfs-util.h
M common/thrift/metrics.json
22 files changed, 2,065 insertions(+), 211 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/16264/1
--
To view, visit http://gerrit.cloudera.org:8080/16264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495
Gerrit-Change-Number: 16264
Gerrit-PatchSet: 1
Gerrit-Owner: Yida Wu 


[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16219 )

Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@401
PS11, Line 401: List sortExprs = Expr.substituteList(origSortExprs, 
getOutputSmap(), analyzer, false);
line too long (96 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 11
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 02:01:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator

2020-07-30 Thread Aman Sinha (Code Review)
Hello Qifan Chen, Shant Hovsepian, David Rorke, Tim Armstrong, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16219

to look at the new patch set (#11).

Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator
..

IMPALA-9983 : Pushdown limit to analytic sort operator

This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner. When the pushdown is
applicable, the analytic sort is converted to a TopN sort.
Further, this is split into 2 TopN sorts separated by a
hash partition exchange. This ensures that the limit is
applied as early as possible before hash partitioning.

Fixed couple of additional related issues uncovered as a
result of limit pushdown:
 - Changed the analytic sort's partition-by expr sort
   semantic from NULLS FIRST to NULLS LAST to ensure
   correctness in the presence of limit.
 - The LIMIT on the analytic sort node was causing it to
   be treated as a merging point in the distributed planner.
   Fixed it by introducing an api allowPartitioned() in the
   PlanNode.

Testing:
 - Ran PlannerTest and updated several EXPLAIN plans.
 - Added Planner tests for both positive and negative cases of
   limit pushdown.
 - Ran end-to-end TPC-DS queries. Specifically tested
   TPC-DS q67 for limit pushdown and result correctness.
 - Added targeted end-to-end tests (TODO: capture results)

Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
---
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/analysis/SortInfo.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test
M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A 
testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test
M tests/query_test/test_queries.py
27 files changed, 1,297 insertions(+), 278 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/11
--
To view, visit http://gerrit.cloudera.org:8080/16219
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Gerrit-Change-Number: 16219
Gerrit-PatchSet: 11
Gerrit-Owner: Aman Sinha 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6200/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 31 Jul 2020 01:19:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16120 )

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6744/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Fri, 31 Jul 2020 01:17:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16263 )

Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and 
libstdc++ binaries
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6743/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
Gerrit-Change-Number: 16263
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Jul 2020 01:08:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query

2020-07-30 Thread Grant Henke (Code Review)
Hello Qifan Chen, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16120

to look at the new patch set (#7).

Change subject: IMPALA-9903: Reduce Kudu openTable calls per query
..

IMPALA-9903: Reduce Kudu openTable calls per query

This patch reduces the number of Kudu openTable calls for the
lifetime of a query by storing the KuduTable object in the
Analyzer GlobalState and using it in the KuduScanNode.

It does not cache the KuduTable object longer than a single
query, does not impact DDL statements, and does not
introduce the need to invalidate metadata when interacting with
Kudu tables.

Additionally, this patch adjusts the backend scanner to use the
KuduTable instance from the KuduScanner instead of using
openTable to get a new instance.

Reducing the number of openTable calls is important because each
call results in a GetTableSchema RPC to the remote leader Kudu
master. With very high rates of queries against Kudu tables this
can overload the master leading to degraded query performance.

In manual testing this patched reduced the Kudu GetTableSchema
RPC calls to the master from 5 per query to 1 per query.

Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
---
M be/src/exec/kudu-scan-node-base.cc
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-scanner.cc
M bin/impala-config.sh
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
8 files changed, 86 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/7
--
To view, visit http://gerrit.cloudera.org:8080/16120
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63
Gerrit-Change-Number: 16120
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16263 )

Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and 
libstdc++ binaries
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6199/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
Gerrit-Change-Number: 16263
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 31 Jul 2020 00:41:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries

2020-07-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16263


Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and 
libstdc++ binaries
..

IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries

Strip debug symbols from libkudu_client.so and libstdc++.so. The same
technique used to strip debug symbols from impalad binaries is used.

This decreases the Docker image sizes by about 100 MB.

Test:
* Ran Dockerized tests

Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
---
M docker/setup_build_context.py
1 file changed, 16 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16263/1
--
To view, visit http://gerrit.cloudera.org:8080/16263
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294
Gerrit-Change-Number: 16263
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar 


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 31 Jul 2020 00:32:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16240 )

Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large 
read/write pages in GroupingAggregator
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16240
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775
Gerrit-Change-Number: 16240
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 23:39:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16242 )

Change subject: IMPALA-9979: part 2: partitioned top-n
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6742/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Gerrit-Change-Number: 16242
Gerrit-PatchSet: 12
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Comment-Date: Thu, 30 Jul 2020 23:24:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n

2020-07-30 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Shant Hovsepian, David Rorke, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16242

to look at the new patch set (#12).

Change subject: IMPALA-9979: part 2: partitioned top-n
..

IMPALA-9979: part 2: partitioned top-n

The planner now identifies predicates that can be converted into
limits in a partitioned or unpartitioned top-n with the following
method:
* Push down predicates that reference analytic tuple into inline view.
  These will be evaluated after the analytic plan for the inline
  SelectStmt is generated.
* Identify predicates that reference the analytic tuple and could
  be converted to limits.
* If they can be applied to the last sort group of the analytic
  plan, and the windows are all compatible, then the lowest
  limit gets converted into a limit in the top N.
* Otherwise generate a select node with the conjuncts. We add
  logic to merge SELECT nodes to avoid generating duplicates
  from inside and outside the inline view.

The optimization can be disabled by setting
ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is
only enable for limits of 1000 or less, because the
in-memory Top-N may perform significantly worse than
a full sort for large heaps. We could probably optimize
this more with better tuning so that it can gracefully
fall back to doing the full sort at runtime.

rank() and row_number() are handled. rank() needs support in
the TopN node to include ties for the last place, which is
also added in this patch.

If predicates are trivially false, we generate empty nodes.

The logic to choose between TopNNode and SortNode based
on TOPN_BYTES_LIMIT is moved from SingleNodePlanner to
SortNode so it can be reused.

The top-n node in the backend is augmented to handle both
the partitioning (for which we use a std::map and a
comparator based on the partition exprs) and the tie-handling
semantics required by rank() predicates. The partitioned
top-n node has a soft limit of 64MB on the size of the
in-memory heaps and can spill with use of an embedded Sorter.

We currently use the partitioned top-n node to implement
rank() pushdown in all cases because of the tie-handling
support. We also cannot use the merging exchange for
rank() because the limit does not handle ties in the same way,
so we need to generate an unordered exchange with a partitioned
top-n node on top of the exchange.

Limitations:
There are several possible extensions to this that we did not do:
* dense_rank() is not supported because it would require additional
  backend support - IMPALA-10014.
* Only one predicate per analytic is pushed.
* Redundant rank()/row_number() predicates are not merged,
  only the lowest is chosen.
* Lower bounds are not converted into OFFSET.
* The analytic operator cannot be eliminated even if the analytic
  expression was only used in the predicate.
* This doesn't push predicates into UNION - IMPALA-10013
* Always false predicates don't result in empty plan - IMPALA-10015
* We evict all in memory partitions when under memory pressure -
  this could be improved - IMPALA-10023.
* The top-n node rebuilds an in-memory heap per partition
  during the output phase. This required less code but adds
  some avoidable overhead - see IMPALA-10025.

Tests:
* Planner tests - added tests that exercise the interesting code
  paths added in planning.
  - Predicate ordering in SELECT nodes changed in a couple of cases
because some predicates were pushed into the inline views.
* Modified SORT targeted perf tests to avoid conversion to Top-N
* Added targeted perf test for partitioned top-n.
* End-to-end tests
 - Unpartitioned Top-N end-to-end tests
 - Basic partitioning and duplicate handling tests on functional
 - Similar basic tests on larger inputs from TPC-DS and with
   larger partition counts.
 - I inspected the results and also ran the same tests with
   analytic_rank_pushdown_threshold=0 to confirm that the
   results were the same as with the full sort.
 - Fallback to spilling sort.

Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/slot-ref.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/tuple-row-compare.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
M 

[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66

2020-07-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16257 )

Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66
..

IMPALA-9903: Bump Kudu version to 5ad5d3d66

This patch bumps Kudu to commit 5ad5d3d66 to pull in
KuduScanner.GetKuduTable which will be used in
https://gerrit.cloudera.org/#/c/16120/

Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd
Reviewed-on: http://gerrit.cloudera.org:8080/16257
Reviewed-by: Joe McDonnell 
Tested-by: Joe McDonnell 
---
M buildall.sh
1 file changed, 1 insertion(+), 1 deletion(-)

Approvals:
  Joe McDonnell: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd
Gerrit-Change-Number: 16257
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66

2020-07-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16257 )

Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66
..


Patch Set 1: Verified+1

This passed a build of all components on supported platforms.


--
To view, visit http://gerrit.cloudera.org:8080/16257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd
Gerrit-Change-Number: 16257
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 20:51:21 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66

2020-07-30 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16257 )

Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd
Gerrit-Change-Number: 16257
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 20:50:41 +
Gerrit-HasComments: No


[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66

2020-07-30 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16257 )

Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66
..


Patch Set 1:

This patch successfully built on Jenkins (without publishing).


--
To view, visit http://gerrit.cloudera.org:8080/16257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd
Gerrit-Change-Number: 16257
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 20:41:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16235 )

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..


Patch Set 8: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:34:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6197/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6741/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:11:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9478: Profiles should indicate if custom UDFs are being used

2020-07-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16188 )

Change subject: IMPALA-9478: Profiles should indicate if custom UDFs are being 
used
..


Patch Set 5:

(2 comments)

> Did you run exhaustive tests? Would be good to do that just to be sure 
> nothing else needs to be updated.

Ran exhaustive tests, everything passed.

http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@222
PS5, Line 222: if (fn_ != null && !fnName_.isBuiltin()) {
> We do have the ToSqlOptions that could maybe control this to hide it in err
yeah using ToSqlOptions makes sense, I briefly looked into it but it didn't 
seem that straightforward because ToSql is called in so many places.


http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@491
PS5, Line 491: "User Defined Functions (UDFs)"
> I wonder if it worth the effort to make the key more explicit:
Yeah that would be nice because it makes it consistent with the info from 
toSql, unfortunately the info about whether it is a native vs. java udf is only 
available in the fn_ instance variable, which isn't set until the end of the 
function.

it's probably do-able, but maybe not worth the effort since the same info is in 
the explain plan already.



--
To view, visit http://gerrit.cloudera.org:8080/16188
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I79122e6cc74fd5a62c76962289a1615fbac2f345
Gerrit-Change-Number: 16188
Gerrit-PatchSet: 5
Gerrit-Owner: Sahil Takiar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 19:04:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Shant Hovsepian (Code Review)
Shant Hovsepian has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16123 )

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java:

http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277
PS9, Line 277:   JoinOperator joinOp = operand.getSetOperator() == 
SetOperator.EXCEPT ?
> nit: we could declare the variable on l 309 where it's assigned.
Done


http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331
PS9, Line 331: List initialOps = new 
ArrayList<>();
> It doesn't look like we do anything with this view? Is it mean to wrap eiSe
Yes good catch, it was something I refactored out as the union operands can be 
querystmts versus just tablerefs.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470
PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN]
> For future reference, I have created a JIRA: IMPALA-10008
Ack


http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test:

http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212
PS9, Line 212: select distinct id, year, month from functional.alltypestiny 
where year=2009 and month=1
> I guess the distinct is sorta serving as an execution hint here, right? Sin
Mostly to exercise the rewrite test case. Without the distincts for now we 
wouldn't be able to use an INNER join. In this case since id is kind of like a 
key, the distinct is redundant but we don't have a way of detecting that.


http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test
File testdata/workloads/functional-query/queries/QueryTest/intersect.test:

http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4
PS8, Line 4:  RESULTS
> Sorry, one more test suggestion that is based on a common pattern:  branch
Added some planner tests, will file a JIRA it works in some cases but in 
general we could remove the JOIN but instead it just creates and emptyset below 
the join. I could be wrong but seems like a generally useful optimization for 
join types in general.



--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 18:46:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

2020-07-30 Thread Shant Hovsepian (Code Review)
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16123

to look at the new patch set (#11).

Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
..

IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]

INTERSECT and EXCEPT set operations are implemented as rewrites to
joins. Currently only the DISTINCT qualified operators are implemented,
not ALL qualified. The operator MINUS is supported as an alias for
EXCEPT.

We mimic Oracle and Hive's non-standard implementation which treats all
operators with the same precedence, as opposed to the SQL Standard of
giving INTERSECT higher precedence.

A new class SetOperationStmt was created to encompass the previous
UnionStmt behavior. UnionStmt is preserved as a special case of union
only operands to ensure compatibility with previous union planning
behavior.

Tests:
* Added parser and analyzer tests.
* Ensured no test failures or plan changes for union tests.
* Added TPC-DS queries 14,38,87 to functional and planner tests.
* Added functional tests test_intersect test_except
* New planner testSetOperationStmt

Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
---
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java
M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/empty.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test
A testdata/workloads/functional-query/queries/QueryTest/except.test
A testdata/workloads/functional-query/queries/QueryTest/intersect.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q14-1.test
A testdata/workloads/tpcds/queries/tpcds-q14-2.test
A testdata/workloads/tpcds/queries/tpcds-q38.test
A testdata/workloads/tpcds/queries/tpcds-q87.test
M tests/query_test/test_queries.py
M tests/query_test/test_tpcds_queries.py
M tests/util/parse_util.py
30 files changed, 5,117 insertions(+), 796 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/11
--
To view, visit http://gerrit.cloudera.org:8080/16123
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f
Gerrit-Change-Number: 16123
Gerrit-PatchSet: 11
Gerrit-Owner: Shant Hovsepian 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs

2020-07-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16237 )

Change subject: IMPALA-10006: handle non-writable /opt/impala/logs
..

IMPALA-10006: handle non-writable /opt/impala/logs

The shutdown script should not abort if it can't write
a log - it should continue to try and shut down impala.

The entrypoint script should abort with an explicit
error if the log directory isn't writable by the
current user.

Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Reviewed-on: http://gerrit.cloudera.org:8080/16237
Tested-by: Impala Public Jenkins 
Reviewed-by: Attila Jeges 
Reviewed-by: Andrew Sherman 
---
M bin/graceful_shutdown_backends.sh
M docker/daemon_entrypoint.sh
2 files changed, 10 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Verified
  Attila Jeges: Looks good to me, but someone else must approve
  Andrew Sherman: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Gerrit-Change-Number: 16237
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans

2020-07-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16098 )

Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad 
plans
..


Patch Set 25:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16098/25/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

http://gerrit.cloudera.org:8080/#/c/16098/25/tests/metadata/test_explain.py@132
PS25, Line 132: # Set the number of rows at the table level to -1.
  : self.execute_query(
  :   "alter table %s set tblproperties('numRows'='-1')" % 
mixed_tbl)
just curious why this is necessary?



--
To view, visit http://gerrit.cloudera.org:8080/16098
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576
Gerrit-Change-Number: 16098
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Qifan Chen 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 17:38:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10007: Impala development environment does not support Ubuntu 20.04

2020-07-30 Thread Sahil Takiar (Code Review)
Sahil Takiar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16241 )

Change subject: IMPALA-10007: Impala development environment does not support 
Ubuntu 20.04
..


Patch Set 6: Code-Review+2

(1 comment)

Were you able to figure out Aman's comment here: 
https://gerrit.cloudera.org/#/c/16238/5/bin/bootstrap_toolchain.py@95

http://gerrit.cloudera.org:8080/#/c/16241/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16241/6//COMMIT_MSG@15
PS6, Line 15: The work addresses the current limitation in Impala development
: environment in that Ubuntu 20.04 is not supportd. The fix modifies
: bootstrap_system.sh and bootstrap_toolchain.py to specifically
: allow the bootstrapping of the Ubuntu 18.04 Impala development
: environment on a machine running Ubuntu 20.04. Limited use shows
: that the environment is useful and stable, similar to the one
: running on Ubuntu 18.04.
you can delete this



--
To view, visit http://gerrit.cloudera.org:8080/16241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947
Gerrit-Change-Number: 16241
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Thu, 30 Jul 2020 17:35:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs

2020-07-30 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16237 )

Change subject: IMPALA-10006: handle non-writable /opt/impala/logs
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Gerrit-Change-Number: 16237
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 17:17:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs

2020-07-30 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16237 )

Change subject: IMPALA-10006: handle non-writable /opt/impala/logs
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/16237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756
Gerrit-Change-Number: 16237
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 16:34:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate

2020-07-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16227 )

Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate
..


Patch Set 3:

I think we should make some more effort to try to repro this, since it's not 
obvious that it is unrelated to the change. Of the tests that were running, 
test_nested_types and test_spilling look the most plausibly related, so I'll 
loop those on one of my machines.


--
To view, visit http://gerrit.cloudera.org:8080/16227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e
Gerrit-Change-Number: 16227
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 15:54:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate

2020-07-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16227 )

Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate
..


Patch Set 3:

TBH that was an educated guess based on the lack of a stack trace - if it was 
in interpreted code there is typically a stack - 

Stack: [0x7f0e8d31b000,0x7f0e8db1c000],  sp=0x7f0e8db192a8,  free 
space=8184k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libc.so.6+0x14e224]


--
To view, visit http://gerrit.cloudera.org:8080/16227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e
Gerrit-Change-Number: 16227
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 15:25:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function

2020-07-30 Thread Adam Tamas (Code Review)
Adam Tamas has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16259 )

Change subject: IMPALA-9963: Implement ds_kll_n() function
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16259/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16259/1//COMMIT_MSG@9
PS1, Line 9: s
I think this should be in singular.



--
To view, visit http://gerrit.cloudera.org:8080/16259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781
Gerrit-Change-Number: 16259
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Adam Tamas 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 15:13:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16235 )

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6740/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 13:43:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16259 )

Change subject: IMPALA-9963: Implement ds_kll_n() function
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6739/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781
Gerrit-Change-Number: 16259
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 13:34:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16235

to look at the new patch set (#8).

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..

IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions

ds_kll_sketch() is an aggregate function that receives a float
parameter (e.g. a float column of a table) and returns a serialized
Apache DataSketches KLL sketch of the input data set wrapped into
STRING type. This sketch can be saved into a table or view and later
used for quantile approximations. ds_kll_quantile() receives two
parameters: a STRING parameter that contains a serialized KLL sketch
and a DOUBLE that represents the rank of the quantile in the range of
[0,1]. E.g. rank=0.1 means the approximate value in the sketch where
10% of the sketched items are less than or equals to this value.

Testing:
  - Added automated tests on small data sets to check the basic
functionality of sketching and getting a quantile approximate.
  - Tested on TPCH25_parquet.lineitem to check that sketching and
approximating works on bigger scale as well where serialize/merge
phases are also required. On this scale the error range of the
quantile approximation is within 1-1.5%

Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
12 files changed, 333 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16235/8
--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 8
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function

2020-07-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16259


Change subject: IMPALA-9963: Implement ds_kll_n() function
..

IMPALA-9963: Implement ds_kll_n() function

This functions receives a serialized Apache DataSketches KLL sketch
and returns how many input values were fed into this sketch.

Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
4 files changed, 55 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16259/1
--
To view, visit http://gerrit.cloudera.org:8080/16259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781
Gerrit-Change-Number: 16259
Gerrit-PatchSet: 1
Gerrit-Owner: Gabor Kaszab 


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16235 )

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/6738/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 10:51:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16235 )

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16235/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16235/6//COMMIT_MSG@9
PS6, Line 9: ds_kll_sketch() is an aggregate function that receives a float
> nit: wrap at 72 chars
Done


http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/aggregate-functions-ir.cc@1618
PS6, Line 1618: rin
> nit: could add "using std::string" + same for stringstream. This is already
Done


http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions-ir.cc@50
PS6, Line 50: LogSketchDeserializationError(ctx);
> Do you know if the datasketches code uses exceptions? I am wondering if the
Good point! In fact here we are safe as we can get invalid_argument exception 
if rank is not in [0,1] but I check it above. Some other exceptions are thrown 
if the internal state of the kll_sketch is off, that is not possible to happen 
but still it doesn't hurt to add a try-catch around this call.
Additionally, DeserializeDsSketch() covers for invalid_arguments error, but I 
might add another catch block to be on the safe side.


http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions.h@33
PS6, Line 33: distinc
> typo: distinct
Done



--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 Jul 2020 10:31:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions

2020-07-30 Thread Gabor Kaszab (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16235

to look at the new patch set (#7).

Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() 
functions
..

IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions

ds_kll_sketch() is an aggregate function that receives a float
parameter (e.g. a float column of a table) and returns a serialized
Apache DataSketches KLL sketch of the input data set wrapped into
STRING type. This sketch can be saved into a table or view and later
used for quantile approximations. ds_kll_quantile() receives two
parameters: a STRING parameter that contains a serialized KLL sketch
and a DOUBLE that represents the rank of the quantile in the range of
[0,1]. E.g. rank=0.1 means the approximate value in the sketch where
10% of the sketched items are less than or equals to this value.

Testing:
  - Added automated tests on small data sets to check the basic
functionality of sketching and getting a quantile approximate.
  - Tested on TPCH25_parquet.lineitem to check that sketching and
approximating works on bigger scale as well where serialize/merge
phases are also required. On this scale the error range of the
quantile approximation is within 1-1.5%

Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-common.h
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/data/README
A testdata/data/kll_sketches_from_hive.parquet
A testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test
M tests/query_test/test_datasketches.py
12 files changed, 333 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16235/7
--
To view, visit http://gerrit.cloudera.org:8080/16235
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52
Gerrit-Change-Number: 16235
Gerrit-PatchSet: 7
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate

2020-07-30 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16227 )

Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate
..


Patch Set 3:

Tim, how can you see that from the crash dump?
Probably it is somehow flaky because now it passed without modifications.


--
To view, visit http://gerrit.cloudera.org:8080/16227
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e
Gerrit-Change-Number: 16227
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 30 Jul 2020 09:15:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] WIP: IMPALA-9979: part 2: partitioned top-n

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16242 )

Change subject: WIP: IMPALA-9979: part 2: partitioned top-n
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6737/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16242
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
Gerrit-Change-Number: 16242
Gerrit-PatchSet: 11
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Shant Hovsepian 
Gerrit-Comment-Date: Thu, 30 Jul 2020 06:58:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10024: isBlackListedDb() should do a case-insensitive check

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16254 )

Change subject: IMPALA-10024: isBlackListedDb() should do a case-insensitive 
check
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41
Gerrit-Change-Number: 16254
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 30 Jul 2020 06:53:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10024: isBlackListedDb() should do a case-insensitive check

2020-07-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16254 )

Change subject: IMPALA-10024: isBlackListedDb() should do a case-insensitive 
check
..

IMPALA-10024: isBlackListedDb() should do a case-insensitive check

The util method CatalogServiceCatalog#isBlackListedDb() expects the
input dbName to be in lower-case which could be error-prone.
Specifically, this can cause issues when Metastore event which has
dbName which is in a different case than one configured in
--blacklisted_dbs. In such cases the EventsProcessor does not ignore
the event and can go into error state.

The fix modifies the isBlackListedDb method to do a case-insensitive
comparision. The isBlacklistedTable is not affected by this issue
since TableName has built-in mechanism to ignore the case.

Testing Done:
1. Modified the test_event_processing.py such that event generated
has a different case than what is configured in --blacklisted_dbs.
The updated test works after the patch.
2. Ran existing tests for events processor.

Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41
Reviewed-on: http://gerrit.cloudera.org:8080/16254
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M tests/custom_cluster/test_event_processing.py
3 files changed, 28 insertions(+), 4 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/16254
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41
Gerrit-Change-Number: 16254
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] WIP: IMPALA-9979: part 2: partitioned top-n

2020-07-30 Thread Tim Armstrong (Code Review)
Hello Aman Sinha, Shant Hovsepian, David Rorke,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16242

to look at the new patch set (#11).

Change subject: WIP: IMPALA-9979: part 2: partitioned top-n
..

WIP: IMPALA-9979: part 2: partitioned top-n

The planner now identifies predicates that can be converted into
limits in a partitioned or unpartitioned top-n with the following
method:
* Push down predicates that reference analytic tuple into inline view.
  These will be evaluated after the analytic plan for the inline
  SelectStmt is generated.
* Identify predicates that reference the analytic tuple and could
  be converted to limits.
* If they can be applied to the last sort group of the analytic
  plan, and the windows are all compatible, then the lowest
  limit gets converted into a limit in the top N.
* Otherwise generate a select node with the conjuncts. We add
  logic to merge SELECT nodes to avoid generating duplicates
  from inside and outside the inline view.

The optimization can be disabled by setting
ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is
only enable for limits of 1000 or less, because the
in-memory Top-N may perform significantly worse than
a full sort for large heaps. We could probably optimize
this more with better tuning so that it can gracefully
fall back to doing the full sort at runtime.

rank() and row_number() are handled. rank() needs support in
the TopN node to include ties for the last place, which is
also added in this patch.

If predicates are trivially false, we generate empty nodes.

The logic to choose between TopNNode and SortNode based
on TOPN_BYTES_LIMIT is moved from SingleNodePlanner to
SortNode so it can be reused.

The top-n node in the backend is augmented to handle both
the partitioning (for which we use a std::map and a
comparator based on the partition exprs) and the tie-handling
semantics required by rank() predicates. The partitioned
top-n node has a soft limit of 64MB on the size of the
in-memory heaps and can spill with use of an embedded Sorter.

We currently use the partitioned top-n node to implement
rank() pushdown in all cases because of the tie-handling
support. We also cannot use the merging exchange for
rank() because the limit does not handle ties in the same way,
so we need to generate an unordered exchange with a partitioned
top-n node on top of the exchange.

Limitations:
There are several possible extensions to this that we did not do:
* dense_rank() is not supported because it would require additional
  backend support - IMPALA-10014.
* Only one predicate per analytic is pushed.
* Redundant rank()/row_number() predicates are not merged,
  only the lowest is chosen.
* Lower bounds are not converted into OFFSET.
* The analytic operator cannot be eliminated even if the analytic
  expression was only used in the predicate.
* This doesn't push predicates into UNION - IMPALA-10013
* Always false predicates don't result in empty plan - IMPALA-10015
* We evict all in memory partitions when under memory pressure -
  this could be improved - IMPALA-10023.
* The top-n node rebuilds an in-memory heap per partition
  during the output phase. This required less code but adds
  some avoidable overhead - see IMPALA-10025.

Tests:
* Planner tests - added tests that exercise the interesting code
  paths added in planning.
  - Predicate ordering in SELECT nodes changed in a couple of cases
because some predicates were pushed into the inline views.
* Modified SORT targeted perf tests to avoid conversion to Top-N
* Added targeted perf test for partitioned top-n.
* End-to-end tests
 - Unpartitioned Top-N end-to-end tests
 - Basic partitioning and duplicate handling tests on functional
 - Similar basic tests on larger inputs from TPC-DS and with
   larger partition counts.

TODO:
  - Spilling because of large partitions
  - In-memory heap evictions

This results in heap evictions -
select * from (
  select d_date, i_item_id, ss_list_price,
rank() over (partition by d_date, ss_store_sk order by ss_list_price 
desc) rnk
  from store_sales ss
  join item i on ss_item_sk = i_item_sk
  join date_dim d on ss_sold_date_sk = d_date_sk
  where ss_list_price is not null) v
where rnk = 500
order by d_date limit 50;

Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/exec-node.cc
M be/src/exec/topn-node-ir.cc
M be/src/exec/topn-node.cc
M be/src/exec/topn-node.h
M be/src/exprs/slot-ref.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/tuple-row-compare.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java
M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M