[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 05:14:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@421 PS11, Line 421: pbExprs.size() > sortExprs.size()) return false; nit: we'd usually enclose the body with braces for multi-line statements. http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@1064 PS11, Line 1064: analyticNode.removeChild(sortNode); nit: setChild(0, upperTopN) is slightly more concise than the remove/add pair http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/SortNode.java@143 PS11, Line 143: partitioningExprs_ = partitioningExprs; I think also need to call computeStats() again so that the limit can be factored into the estimates. I think the other state set in init() doesn't need to change. http://gerrit.cloudera.org:8080/#/c/16219/11/testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test File testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test: PS11: We should move this to being invoked from a python test class for the workload TPC-H, and also remove the tpch. prefixes for the tables - that way the test parameterisation on file formats will work correctly. Currently it will be run redundantly on multiple functional file formats. http://gerrit.cloudera.org:8080/#/c/16219/11/testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test@54 PS11, Line 54: RESULTS Spoke directly, but just leaving a comment to mention that the result sets are empty. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 11 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 04:23:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16264 ) Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6746/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 Gerrit-Change-Number: 16264 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 02:30:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6745/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 11 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 02:22:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 11: (11 comments) http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@374 PS10, Line 374:* @param sortInfo The sort info from the outer sort node > nit: remove empty @param annotations? Added the descriptions. http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@422 PS10, Line 422: > I think we should be able to use expression equality, since the partition a Good point about using the substitutedPartitionExprs_. When I used it, it was still not able to do the equivalence comparison. After working through various expr mappings, I realized that it is a little late to use the sortInfo's sortExprs since they have already been substituted. I added a new field in SortInfo to keep the original sort exprs and after substituting those, was able to do the comparison. http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@431 PS10, Line 431: !(pbExpr instanceof SlotRef && so > I wonder if this check is necessary. If the sort order is descending, then This check is needed because the partition-by exprs are always ASC order. http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@481 PS10, Line 481: lhs).getDe > Not sure the rational behind it. I added a comment explaining this. http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@487 PS10, Line 487:get(0)).getFnC > Not follow. Same as above. It has to do with containment within the limit values. http://gerrit.cloudera.org:8080/#/c/16219/6/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java File fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/6/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java@266 PS6, Line 266: return createSortInfo(input, sortExprs, isAsc, nullsFirst, TSortingOrder.LEXICAL); > I will revert this particular change in the next patch since this is not ne Done http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@391 PS8, Line 391: // so limit pushdown is not applicable > I think we can generate an analytic without a sort in some edge cases, e.g. Yes, this was a bug. Fixed it. http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@427 PS8, Line 427: } > We should maybe also avoid going into Subplans? I guess it doesn't really m Added a check for Subplan and actually restricted the tree-walk to only single input operators. http://gerrit.cloudera.org:8080/#/c/16219/8/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@434 PS8, Line 434: root.getChildren().size() > 1) { > This looks like it just goes down the left branch of the plan tree - is tha Yeah, I was thinking of the narrow use case where it goes left deep on single input operators..but yeah this is confusing. I rewrote it and simplified to only allow single child. http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@343 PS10, Line 343: offset == 0 > Thought that offset is non-negative in SQL. Done http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SortNode.java File fe/src/main/java/org/apache/impala/planner/SortNode.java: http://gerrit.cloudera.org:8080/#/c/16219/10/fe/src/main/java/org/apache/impala/planner/SortNode.java@143 PS10, Line 143: partitioningExprs_ = partitioningExprs; > If this is a TopN sort, then the method should succeed if both the limit an Yeah, in theory one can call convertToTopN on a node that is already TopN ,but for this patch I would prefer to restrict it to a narrower use case. -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
[Impala-ASF-CR] WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1
Yida Wu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16264 Change subject: WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 .. WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 WIP: IMPALA-9867: Add Support for Spilling to S3: Milestone 1 Major Features: 1) Local files as buffers for spilling to S3. 2) Async Upload and Sync Fetching of remote files. 3) Sync remote files deletion after query ends. 4) Local buffer files management. 5) Compatibility of spilling to local and remote. 6) All the errors from hdfs/s3 should terminate the query. Implementation Details: 1) An new enum type is added to specify the function of local files. LocalFileMode::BUFFER and LocalFileMode::FILE. LocalFileMode::BUFFER indicates that the local file is used as a buffer for remote operations. LocalFileMode::FILE indicates the local file is used for spilling to local. Also, startup option "remote_tmp_file_local_buff_mode" is added to specify the implementation of the reading pages from the remote. If set to true, the whole file would be fetched to the local buffer during reading. If set to false, only a page is read for each reading. 2) Two disk queues have been added to do the file operation jobs. Queue name: RemoteS3DiskFileOper/RemoteDfsDiskFileOper File operations on the remote disk like upload and fetch should be done in these queues. The purpose of the queues is to seperate long run operations with short ones, and also to have a more accurate control on the thread number working on these file operation jobs, sometimes we might don't want too many upload and fetch jobs working in the same time. RemoteOperRange is the new type to carry the file operation jobs. Previously,we have request types of READ and WRITE. Now FETCH/UPLOAD/EVICT have been added. 3) The tmp files are deleted when the tmp file group is deconstructing. 4) The local buffer files management is to control the total size of local buffer files and evict files if needed. There are basically five status of a remote tmp file, IN_WRITING/DUMPED/IN_DUMPING/REMOTE/TO_DELETE. A local buffer file can be evicted only if it is in status REMOTE. An EVICT job is sent to the local disk queue if a file is decided to be evicted. There are two modes to decide the sequence of choosing files to be evicted. Default is LIFO, the other is FIFO. It can be decided by startup option "remote_tmp_files_avail_pool_lifo". 5) Spilling to local has higher priority than spilling to remote. If no local scratch space is available, temporary data will be spilled to remote. Remote scratch space uses the highest priority local scratch dir as its buffer. If no local scratch space or only one has been configured, a default local buffer should be used. The purpose of the design is to simplify the implementation in milestone 1 with less changes on the configuration. Limitations: * Only one remote scratch dir is supported. * The highest priority local scratch dir is used for the buffer of remote scratch space if remote scratch dir exists. TODO: - Testcases - Refine the naming of the remote scratch dir and files. - Upper and lower bounds of new options related to size. - More accurate error codes and error handling. - Preserve memory buffer for block buffers on file upload and fetch. - Jobs cancelling for new disk queues. - Some metrics might need to be added. - Efficiency issue when mixing local and remote scratch space. Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 --- M be/src/runtime/hdfs-fs-cache.cc M be/src/runtime/io/CMakeLists.txt M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h A be/src/runtime/io/file-writer.h M be/src/runtime/io/hdfs-file-reader.cc A be/src/runtime/io/hdfs-file-writer.cc A be/src/runtime/io/hdfs-file-writer.h M be/src/runtime/io/local-file-system.cc M be/src/runtime/io/local-file-system.h A be/src/runtime/io/local-file-writer.cc A be/src/runtime/io/local-file-writer.h M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/runtime/io/request-ranges.h M be/src/runtime/io/scan-range.cc M be/src/runtime/tmp-file-mgr-internal.h M be/src/runtime/tmp-file-mgr.cc M be/src/runtime/tmp-file-mgr.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/metrics.json 22 files changed, 2,065 insertions(+), 211 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/64/16264/1 -- To view, visit http://gerrit.cloudera.org:8080/16264 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ia5aa4036b4c72656b4297f9fbe42e21d2796a495 Gerrit-Change-Number: 16264 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16219 ) Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java: http://gerrit.cloudera.org:8080/#/c/16219/11/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java@401 PS11, Line 401: List sortExprs = Expr.substituteList(origSortExprs, getOutputSmap(), analyzer, false); line too long (96 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 11 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 02:01:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9983 : Pushdown limit to analytic sort operator
Hello Qifan Chen, Shant Hovsepian, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16219 to look at the new patch set (#11). Change subject: IMPALA-9983 : Pushdown limit to analytic sort operator .. IMPALA-9983 : Pushdown limit to analytic sort operator This patch pushes the LIMIT from a top level Sort down to the Sort below an Analytic operator when it is safe to do so. There are several qualifying checks that are done. The optimization is done at the time of creating the top level Sort in the single node planner. When the pushdown is applicable, the analytic sort is converted to a TopN sort. Further, this is split into 2 TopN sorts separated by a hash partition exchange. This ensures that the limit is applied as early as possible before hash partitioning. Fixed couple of additional related issues uncovered as a result of limit pushdown: - Changed the analytic sort's partition-by expr sort semantic from NULLS FIRST to NULLS LAST to ensure correctness in the presence of limit. - The LIMIT on the analytic sort node was causing it to be treated as a merging point in the distributed planner. Fixed it by introducing an api allowPartitioned() in the PlanNode. Testing: - Ran PlannerTest and updated several EXPLAIN plans. - Added Planner tests for both positive and negative cases of limit pushdown. - Ran end-to-end TPC-DS queries. Specifically tested TPC-DS q67 for limit pushdown and result correctness. - Added targeted end-to-end tests (TODO: capture results) Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 --- M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/SortInfo.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/SortNode.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns-mt-dop.test M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test M testdata/workloads/functional-planner/queries/PlannerTest/convert-to-cnf.test M testdata/workloads/functional-planner/queries/PlannerTest/inline-view.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test A testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-analytic.test M testdata/workloads/functional-planner/queries/PlannerTest/max-row-size.test M testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test M testdata/workloads/functional-planner/queries/PlannerTest/nested-collections.test M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-propagation.test M testdata/workloads/functional-planner/queries/PlannerTest/semi-join-distinct.test M testdata/workloads/functional-planner/queries/PlannerTest/sort-expr-materialization.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/limit-pushdown-analytic.test M tests/query_test/test_queries.py 27 files changed, 1,297 insertions(+), 278 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/16219/11 -- To view, visit http://gerrit.cloudera.org:8080/16219 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284 Gerrit-Change-Number: 16219 Gerrit-PatchSet: 11 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6200/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 31 Jul 2020 01:19:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16120 ) Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6744/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Fri, 31 Jul 2020 01:17:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16263 ) Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6743/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294 Gerrit-Change-Number: 16263 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 01:08:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9903: Reduce Kudu openTable calls per query
Hello Qifan Chen, Vihang Karajgaonkar, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16120 to look at the new patch set (#7). Change subject: IMPALA-9903: Reduce Kudu openTable calls per query .. IMPALA-9903: Reduce Kudu openTable calls per query This patch reduces the number of Kudu openTable calls for the lifetime of a query by storing the KuduTable object in the Analyzer GlobalState and using it in the KuduScanNode. It does not cache the KuduTable object longer than a single query, does not impact DDL statements, and does not introduce the need to invalidate metadata when interacting with Kudu tables. Additionally, this patch adjusts the backend scanner to use the KuduTable instance from the KuduScanner instead of using openTable to get a new instance. Reducing the number of openTable calls is important because each call results in a GetTableSchema RPC to the remote leader Kudu master. With very high rates of queries against Kudu tables this can overload the master leading to degraded query performance. In manual testing this patched reduced the Kudu GetTableSchema RPC calls to the master from 5 per query to 1 per query. Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 --- M be/src/exec/kudu-scan-node-base.cc M be/src/exec/kudu-scan-node-base.h M be/src/exec/kudu-scanner.cc M bin/impala-config.sh M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/catalog/FeKuduTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalKuduTable.java M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java 8 files changed, 86 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16120/7 -- To view, visit http://gerrit.cloudera.org:8080/16120 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iec12a5be9b30e19a123142af5453a91bd4300b63 Gerrit-Change-Number: 16120 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16263 ) Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries .. Patch Set 1: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6199/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294 Gerrit-Change-Number: 16263 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 31 Jul 2020 00:41:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10029: Strip debug symbols from libkudu client and libstdc++ binaries
Sahil Takiar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16263 Change subject: IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries .. IMPALA-10029: Strip debug symbols from libkudu_client and libstdc++ binaries Strip debug symbols from libkudu_client.so and libstdc++.so. The same technique used to strip debug symbols from impalad binaries is used. This decreases the Docker image sizes by about 100 MB. Test: * Ran Dockerized tests Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294 --- M docker/setup_build_context.py 1 file changed, 16 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/16263/1 -- To view, visit http://gerrit.cloudera.org:8080/16263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I61fdf47041bd96248ecb48ae57dde143de2da294 Gerrit-Change-Number: 16263 Gerrit-PatchSet: 1 Gerrit-Owner: Sahil Takiar
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 31 Jul 2020 00:32:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16240 ) Change subject: IMPALA-9955,IMPALA-9957: Fix not enough reservation for large read/write pages in GroupingAggregator .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6198/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16240 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3d9c3a2e7f0da60071b920dec979729e86459775 Gerrit-Change-Number: 16240 Gerrit-PatchSet: 4 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 23:39:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: IMPALA-9979: part 2: partitioned top-n .. Patch Set 12: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6742/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 12 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Comment-Date: Thu, 30 Jul 2020 23:24:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9979: part 2: partitioned top-n
Hello Aman Sinha, Shant Hovsepian, David Rorke, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16242 to look at the new patch set (#12). Change subject: IMPALA-9979: part 2: partitioned top-n .. IMPALA-9979: part 2: partitioned top-n The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: * Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. * Identify predicates that reference the analytic tuple and could be converted to limits. * If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. * Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enable for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps. We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. The logic to choose between TopNNode and SortNode based on TOPN_BYTES_LIMIT is moved from SingleNodePlanner to SortNode so it can be reused. The top-n node in the backend is augmented to handle both the partitioning (for which we use a std::map and a comparator based on the partition exprs) and the tie-handling semantics required by rank() predicates. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. We currently use the partitioned top-n node to implement rank() pushdown in all cases because of the tie-handling support. We also cannot use the merging exchange for rank() because the limit does not handle ties in the same way, so we need to generate an unordered exchange with a partitioned top-n node on top of the exchange. Limitations: There are several possible extensions to this that we did not do: * dense_rank() is not supported because it would require additional backend support - IMPALA-10014. * Only one predicate per analytic is pushed. * Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. * Lower bounds are not converted into OFFSET. * The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. * This doesn't push predicates into UNION - IMPALA-10013 * Always false predicates don't result in empty plan - IMPALA-10015 * We evict all in memory partitions when under memory pressure - this could be improved - IMPALA-10023. * The top-n node rebuilds an in-memory heap per partition during the output phase. This required less code but adds some avoidable overhead - see IMPALA-10025. Tests: * Planner tests - added tests that exercise the interesting code paths added in planning. - Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. * Modified SORT targeted perf tests to avoid conversion to Top-N * Added targeted perf test for partitioned top-n. * End-to-end tests - Unpartitioned Top-N end-to-end tests - Basic partitioning and duplicate handling tests on functional - Similar basic tests on larger inputs from TPC-DS and with larger partition counts. - I inspected the results and also ran the same tests with analytic_rank_pushdown_threshold=0 to confirm that the results were the same as with the full sort. - Fallback to spilling sort. Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/exec-node.cc M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exprs/slot-ref.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/tuple-row-compare.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SlotRef.java M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java M fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java M
[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66
Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16257 ) Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66 .. IMPALA-9903: Bump Kudu version to 5ad5d3d66 This patch bumps Kudu to commit 5ad5d3d66 to pull in KuduScanner.GetKuduTable which will be used in https://gerrit.cloudera.org/#/c/16120/ Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd Reviewed-on: http://gerrit.cloudera.org:8080/16257 Reviewed-by: Joe McDonnell Tested-by: Joe McDonnell --- M buildall.sh 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: Joe McDonnell: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/16257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd Gerrit-Change-Number: 16257 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong
[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16257 ) Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66 .. Patch Set 1: Verified+1 This passed a build of all components on supported platforms. -- To view, visit http://gerrit.cloudera.org:8080/16257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd Gerrit-Change-Number: 16257 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 20:51:21 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/16257 ) Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd Gerrit-Change-Number: 16257 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 20:50:41 + Gerrit-HasComments: No
[native-toolchain-CR] IMPALA-9903: Bump Kudu version to 5ad5d3d66
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16257 ) Change subject: IMPALA-9903: Bump Kudu version to 5ad5d3d66 .. Patch Set 1: This patch successfully built on Jenkins (without publishing). -- To view, visit http://gerrit.cloudera.org:8080/16257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38ddb7ecc5049fab7987ceb4726c0cc8c14a6cbd Gerrit-Change-Number: 16257 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 20:41:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/16235 ) Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 19:34:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6197/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 19:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6741/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 19:11:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9478: Profiles should indicate if custom UDFs are being used
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16188 ) Change subject: IMPALA-9478: Profiles should indicate if custom UDFs are being used .. Patch Set 5: (2 comments) > Did you run exhaustive tests? Would be good to do that just to be sure > nothing else needs to be updated. Ran exhaustive tests, everything passed. http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@222 PS5, Line 222: if (fn_ != null && !fnName_.isBuiltin()) { > We do have the ToSqlOptions that could maybe control this to hide it in err yeah using ToSqlOptions makes sense, I briefly looked into it but it didn't seem that straightforward because ToSql is called in so many places. http://gerrit.cloudera.org:8080/#/c/16188/5/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java@491 PS5, Line 491: "User Defined Functions (UDFs)" > I wonder if it worth the effort to make the key more explicit: Yeah that would be nice because it makes it consistent with the info from toSql, unfortunately the info about whether it is a native vs. java udf is only available in the fn_ instance variable, which isn't set until the end of the function. it's probably do-able, but maybe not worth the effort since the same info is in the explain plan already. -- To view, visit http://gerrit.cloudera.org:8080/16188 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I79122e6cc74fd5a62c76962289a1615fbac2f345 Gerrit-Change-Number: 16188 Gerrit-PatchSet: 5 Gerrit-Owner: Sahil Takiar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 19:04:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Shant Hovsepian has posted comments on this change. ( http://gerrit.cloudera.org:8080/16123 ) Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. Patch Set 11: (5 comments) http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java File fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java: http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@277 PS9, Line 277: JoinOperator joinOp = operand.getSetOperator() == SetOperator.EXCEPT ? > nit: we could declare the variable on l 309 where it's assigned. Done http://gerrit.cloudera.org:8080/#/c/16123/9/fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java@331 PS9, Line 331: List initialOps = new ArrayList<>(); > It doesn't look like we do anything with this view? Is it mean to wrap eiSe Yes good catch, it was something I refactored out as the union operands can be querystmts versus just tablerefs. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@470 PS8, Line 470: 10:HASH JOIN [LEFT SEMI JOIN] > For future reference, I have created a JIRA: IMPALA-10008 Ack http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test File testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test: http://gerrit.cloudera.org:8080/#/c/16123/9/testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test@212 PS9, Line 212: select distinct id, year, month from functional.alltypestiny where year=2009 and month=1 > I guess the distinct is sorta serving as an execution hint here, right? Sin Mostly to exercise the rewrite test case. Without the distincts for now we wouldn't be able to use an INNER join. In this case since id is kind of like a key, the distinct is redundant but we don't have a way of detecting that. http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test File testdata/workloads/functional-query/queries/QueryTest/intersect.test: http://gerrit.cloudera.org:8080/#/c/16123/8/testdata/workloads/functional-query/queries/QueryTest/intersect.test@4 PS8, Line 4: RESULTS > Sorry, one more test suggestion that is based on a common pattern: branch Added some planner tests, will file a JIRA it works in some cases but in general we could remove the JOIN but instead it just creates and emptyset below the join. I could be wrong but seems like a generally useful optimization for join types in general. -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 18:46:43 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT]
Hello Aman Sinha, David Rorke, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16123 to look at the new patch set (#11). Change subject: IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] .. IMPALA-9943,IMPALA-4974: INTERSECT/EXCEPT [DISTINCT] INTERSECT and EXCEPT set operations are implemented as rewrites to joins. Currently only the DISTINCT qualified operators are implemented, not ALL qualified. The operator MINUS is supported as an alias for EXCEPT. We mimic Oracle and Hive's non-standard implementation which treats all operators with the same precedence, as opposed to the SQL Standard of giving INTERSECT higher precedence. A new class SetOperationStmt was created to encompass the previous UnionStmt behavior. UnionStmt is preserved as a special case of union only operands to ensure compatibility with previous union planning behavior. Tests: * Added parser and analyzer tests. * Ensured no test failures or plan changes for union tests. * Added TPC-DS queries 14,38,87 to functional and planner tests. * Added functional tests test_intersect test_except * New planner testSetOperationStmt Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f --- M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java A fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java M fe/src/main/java/org/apache/impala/analysis/UnionStmt.java M fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/planner/UnionNode.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/empty.test A testdata/workloads/functional-planner/queries/PlannerTest/setoperation-rewrite.test M testdata/workloads/functional-planner/queries/PlannerTest/tpcds-all.test A testdata/workloads/functional-query/queries/QueryTest/except.test A testdata/workloads/functional-query/queries/QueryTest/intersect.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-1.test A testdata/workloads/tpcds/queries/tpcds-decimal_v2-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q14-1.test A testdata/workloads/tpcds/queries/tpcds-q14-2.test A testdata/workloads/tpcds/queries/tpcds-q38.test A testdata/workloads/tpcds/queries/tpcds-q87.test M tests/query_test/test_queries.py M tests/query_test/test_tpcds_queries.py M tests/util/parse_util.py 30 files changed, 5,117 insertions(+), 796 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/16123/11 -- To view, visit http://gerrit.cloudera.org:8080/16123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5be46f824217218146ad48b30767af0fc7edbc0f Gerrit-Change-Number: 16123 Gerrit-PatchSet: 11 Gerrit-Owner: Shant Hovsepian Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16237 ) Change subject: IMPALA-10006: handle non-writable /opt/impala/logs .. IMPALA-10006: handle non-writable /opt/impala/logs The shutdown script should not abort if it can't write a log - it should continue to try and shut down impala. The entrypoint script should abort with an explicit error if the log directory isn't writable by the current user. Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756 Reviewed-on: http://gerrit.cloudera.org:8080/16237 Tested-by: Impala Public Jenkins Reviewed-by: Attila Jeges Reviewed-by: Andrew Sherman --- M bin/graceful_shutdown_backends.sh M docker/daemon_entrypoint.sh 2 files changed, 10 insertions(+), 1 deletion(-) Approvals: Impala Public Jenkins: Verified Attila Jeges: Looks good to me, but someone else must approve Andrew Sherman: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/16237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756 Gerrit-Change-Number: 16237 Gerrit-PatchSet: 3 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16098 ) Change subject: IMPALA-9744: Treat corrupt table stats as missing to avoid bad plans .. Patch Set 25: (1 comment) http://gerrit.cloudera.org:8080/#/c/16098/25/tests/metadata/test_explain.py File tests/metadata/test_explain.py: http://gerrit.cloudera.org:8080/#/c/16098/25/tests/metadata/test_explain.py@132 PS25, Line 132: # Set the number of rows at the table level to -1. : self.execute_query( : "alter table %s set tblproperties('numRows'='-1')" % mixed_tbl) just curious why this is necessary? -- To view, visit http://gerrit.cloudera.org:8080/16098 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9f4c64616ff7c0b6d5a48f2b5331325feeff3576 Gerrit-Change-Number: 16098 Gerrit-PatchSet: 25 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 17:38:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10007: Impala development environment does not support Ubuntu 20.04
Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16241 ) Change subject: IMPALA-10007: Impala development environment does not support Ubuntu 20.04 .. Patch Set 6: Code-Review+2 (1 comment) Were you able to figure out Aman's comment here: https://gerrit.cloudera.org/#/c/16238/5/bin/bootstrap_toolchain.py@95 http://gerrit.cloudera.org:8080/#/c/16241/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16241/6//COMMIT_MSG@15 PS6, Line 15: The work addresses the current limitation in Impala development : environment in that Ubuntu 20.04 is not supportd. The fix modifies : bootstrap_system.sh and bootstrap_toolchain.py to specifically : allow the bootstrapping of the Ubuntu 18.04 Impala development : environment on a machine running Ubuntu 20.04. Limited use shows : that the environment is useful and stable, similar to the one : running on Ubuntu 18.04. you can delete this -- To view, visit http://gerrit.cloudera.org:8080/16241 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7db302b4f1d57ec9aa2100d7589d5e814db75947 Gerrit-Change-Number: 16241 Gerrit-PatchSet: 6 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Thu, 30 Jul 2020 17:35:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/16237 ) Change subject: IMPALA-10006: handle non-writable /opt/impala/logs .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/16237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756 Gerrit-Change-Number: 16237 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 17:17:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10006: handle non-writable /opt/impala/logs
Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/16237 ) Change subject: IMPALA-10006: handle non-writable /opt/impala/logs .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/16237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If32d6eef75422b51f8877478bbfb1a709c02f756 Gerrit-Change-Number: 16237 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 16:34:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16227 ) Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate .. Patch Set 3: I think we should make some more effort to try to repro this, since it's not obvious that it is unrelated to the change. Of the tests that were running, test_nested_types and test_spilling look the most plausibly related, so I'll loop those on one of my machines. -- To view, visit http://gerrit.cloudera.org:8080/16227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e Gerrit-Change-Number: 16227 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 15:54:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16227 ) Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate .. Patch Set 3: TBH that was an educated guess based on the lack of a stack trace - if it was in interpreted code there is typically a stack - Stack: [0x7f0e8d31b000,0x7f0e8db1c000], sp=0x7f0e8db192a8, free space=8184k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libc.so.6+0x14e224] -- To view, visit http://gerrit.cloudera.org:8080/16227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e Gerrit-Change-Number: 16227 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 15:25:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function
Adam Tamas has posted comments on this change. ( http://gerrit.cloudera.org:8080/16259 ) Change subject: IMPALA-9963: Implement ds_kll_n() function .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/16259/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16259/1//COMMIT_MSG@9 PS1, Line 9: s I think this should be in singular. -- To view, visit http://gerrit.cloudera.org:8080/16259 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781 Gerrit-Change-Number: 16259 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Adam Tamas Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 15:13:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16235 ) Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6740/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 13:43:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16259 ) Change subject: IMPALA-9963: Implement ds_kll_n() function .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6739/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16259 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781 Gerrit-Change-Number: 16259 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 13:34:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16235 to look at the new patch set (#8). Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions ds_kll_sketch() is an aggregate function that receives a float parameter (e.g. a float column of a table) and returns a serialized Apache DataSketches KLL sketch of the input data set wrapped into STRING type. This sketch can be saved into a table or view and later used for quantile approximations. ds_kll_quantile() receives two parameters: a STRING parameter that contains a serialized KLL sketch and a DOUBLE that represents the rank of the quantile in the range of [0,1]. E.g. rank=0.1 means the approximate value in the sketch where 10% of the sketched items are less than or equals to this value. Testing: - Added automated tests on small data sets to check the basic functionality of sketching and getting a quantile approximate. - Tested on TPCH25_parquet.lineitem to check that sketching and approximating works on bigger scale as well where serialize/merge phases are also required. On this scale the error range of the quantile approximation is within 1-1.5% Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 12 files changed, 333 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16235/8 -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 8 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9963: Implement ds kll n() function
Gabor Kaszab has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16259 Change subject: IMPALA-9963: Implement ds_kll_n() function .. IMPALA-9963: Implement ds_kll_n() function This functions receives a serialized Apache DataSketches KLL sketch and returns how many input values were fed into this sketch. Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781 --- M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test 4 files changed, 55 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16259/1 -- To view, visit http://gerrit.cloudera.org:8080/16259 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I166e87a468e68e888ac15fca7429ac2552dbb781 Gerrit-Change-Number: 16259 Gerrit-PatchSet: 1 Gerrit-Owner: Gabor Kaszab
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16235 ) Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. Patch Set 7: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/6738/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 10:51:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16235 ) Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/16235/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16235/6//COMMIT_MSG@9 PS6, Line 9: ds_kll_sketch() is an aggregate function that receives a float > nit: wrap at 72 chars Done http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/aggregate-functions-ir.cc@1618 PS6, Line 1618: rin > nit: could add "using std::string" + same for stringstream. This is already Done http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions-ir.cc File be/src/exprs/datasketches-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions-ir.cc@50 PS6, Line 50: LogSketchDeserializationError(ctx); > Do you know if the datasketches code uses exceptions? I am wondering if the Good point! In fact here we are safe as we can get invalid_argument exception if rank is not in [0,1] but I check it above. Some other exceptions are thrown if the internal state of the kll_sketch is off, that is not possible to happen but still it doesn't hurt to add a try-catch around this call. Additionally, DeserializeDsSketch() covers for invalid_arguments error, but I might add another catch block to be on the safe side. http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions.h File be/src/exprs/datasketches-functions.h: http://gerrit.cloudera.org:8080/#/c/16235/6/be/src/exprs/datasketches-functions.h@33 PS6, Line 33: distinc > typo: distinct Done -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 30 Jul 2020 10:31:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9959: Implement ds kll sketch() and ds kll quantile() functions
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16235 to look at the new patch set (#7). Change subject: IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions .. IMPALA-9959: Implement ds_kll_sketch() and ds_kll_quantile() functions ds_kll_sketch() is an aggregate function that receives a float parameter (e.g. a float column of a table) and returns a serialized Apache DataSketches KLL sketch of the input data set wrapped into STRING type. This sketch can be saved into a table or view and later used for quantile approximations. ds_kll_quantile() receives two parameters: a STRING parameter that contains a serialized KLL sketch and a DOUBLE that represents the rank of the quantile in the range of [0,1]. E.g. rank=0.1 means the approximate value in the sketch where 10% of the sketched items are less than or equals to this value. Testing: - Added automated tests on small data sets to check the basic functionality of sketching and getting a quantile approximate. - Tested on TPCH25_parquet.lineitem to check that sketching and approximating works on bigger scale as well where serialize/merge phases are also required. On this scale the error range of the quantile approximation is within 1-1.5% Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 --- M be/src/exprs/aggregate-functions-ir.cc M be/src/exprs/aggregate-functions.h M be/src/exprs/datasketches-common.cc M be/src/exprs/datasketches-common.h M be/src/exprs/datasketches-functions-ir.cc M be/src/exprs/datasketches-functions.h M common/function-registry/impala_functions.py M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java M testdata/data/README A testdata/data/kll_sketches_from_hive.parquet A testdata/workloads/functional-query/queries/QueryTest/datasketches-kll.test M tests/query_test/test_datasketches.py 12 files changed, 333 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/16235/7 -- To view, visit http://gerrit.cloudera.org:8080/16235 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I11de5fe10bb5d0dd42fb4ee45c4f21cb31963e52 Gerrit-Change-Number: 16235 Gerrit-PatchSet: 7 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9984: Implement codegen for TupleIsNullPredicate
Daniel Becker has posted comments on this change. ( http://gerrit.cloudera.org:8080/16227 ) Change subject: IMPALA-9984: Implement codegen for TupleIsNullPredicate .. Patch Set 3: Tim, how can you see that from the crash dump? Probably it is somehow flaky because now it passed without modifications. -- To view, visit http://gerrit.cloudera.org:8080/16227 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I410aa7ec762ca16f455bd7da1dce763c1a7b156e Gerrit-Change-Number: 16227 Gerrit-PatchSet: 3 Gerrit-Owner: Daniel Becker Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 30 Jul 2020 09:15:14 + Gerrit-HasComments: No
[Impala-ASF-CR] WIP: IMPALA-9979: part 2: partitioned top-n
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16242 ) Change subject: WIP: IMPALA-9979: part 2: partitioned top-n .. Patch Set 11: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6737/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16242 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 Gerrit-Change-Number: 16242 Gerrit-PatchSet: 11 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Shant Hovsepian Gerrit-Comment-Date: Thu, 30 Jul 2020 06:58:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10024: isBlackListedDb() should do a case-insensitive check
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16254 ) Change subject: IMPALA-10024: isBlackListedDb() should do a case-insensitive check .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/16254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41 Gerrit-Change-Number: 16254 Gerrit-PatchSet: 4 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar Gerrit-Comment-Date: Thu, 30 Jul 2020 06:53:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10024: isBlackListedDb() should do a case-insensitive check
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16254 ) Change subject: IMPALA-10024: isBlackListedDb() should do a case-insensitive check .. IMPALA-10024: isBlackListedDb() should do a case-insensitive check The util method CatalogServiceCatalog#isBlackListedDb() expects the input dbName to be in lower-case which could be error-prone. Specifically, this can cause issues when Metastore event which has dbName which is in a different case than one configured in --blacklisted_dbs. In such cases the EventsProcessor does not ignore the event and can go into error state. The fix modifies the isBlackListedDb method to do a case-insensitive comparision. The isBlacklistedTable is not affected by this issue since TableName has built-in mechanism to ignore the case. Testing Done: 1. Modified the test_event_processing.py such that event generated has a different case than what is configured in --blacklisted_dbs. The updated test works after the patch. 2. Ran existing tests for events processor. Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41 Reviewed-on: http://gerrit.cloudera.org:8080/16254 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java M tests/custom_cluster/test_event_processing.py 3 files changed, 28 insertions(+), 4 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/16254 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3898a46b4236413b2e328cecbb2f4364082a5e41 Gerrit-Change-Number: 16254 Gerrit-PatchSet: 5 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Vihang Karajgaonkar
[Impala-ASF-CR] WIP: IMPALA-9979: part 2: partitioned top-n
Hello Aman Sinha, Shant Hovsepian, David Rorke, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16242 to look at the new patch set (#11). Change subject: WIP: IMPALA-9979: part 2: partitioned top-n .. WIP: IMPALA-9979: part 2: partitioned top-n The planner now identifies predicates that can be converted into limits in a partitioned or unpartitioned top-n with the following method: * Push down predicates that reference analytic tuple into inline view. These will be evaluated after the analytic plan for the inline SelectStmt is generated. * Identify predicates that reference the analytic tuple and could be converted to limits. * If they can be applied to the last sort group of the analytic plan, and the windows are all compatible, then the lowest limit gets converted into a limit in the top N. * Otherwise generate a select node with the conjuncts. We add logic to merge SELECT nodes to avoid generating duplicates from inside and outside the inline view. The optimization can be disabled by setting ANALYTIC_RANK_PUSHDOWN_THRESHOLD=0. By default it is only enable for limits of 1000 or less, because the in-memory Top-N may perform significantly worse than a full sort for large heaps. We could probably optimize this more with better tuning so that it can gracefully fall back to doing the full sort at runtime. rank() and row_number() are handled. rank() needs support in the TopN node to include ties for the last place, which is also added in this patch. If predicates are trivially false, we generate empty nodes. The logic to choose between TopNNode and SortNode based on TOPN_BYTES_LIMIT is moved from SingleNodePlanner to SortNode so it can be reused. The top-n node in the backend is augmented to handle both the partitioning (for which we use a std::map and a comparator based on the partition exprs) and the tie-handling semantics required by rank() predicates. The partitioned top-n node has a soft limit of 64MB on the size of the in-memory heaps and can spill with use of an embedded Sorter. We currently use the partitioned top-n node to implement rank() pushdown in all cases because of the tie-handling support. We also cannot use the merging exchange for rank() because the limit does not handle ties in the same way, so we need to generate an unordered exchange with a partitioned top-n node on top of the exchange. Limitations: There are several possible extensions to this that we did not do: * dense_rank() is not supported because it would require additional backend support - IMPALA-10014. * Only one predicate per analytic is pushed. * Redundant rank()/row_number() predicates are not merged, only the lowest is chosen. * Lower bounds are not converted into OFFSET. * The analytic operator cannot be eliminated even if the analytic expression was only used in the predicate. * This doesn't push predicates into UNION - IMPALA-10013 * Always false predicates don't result in empty plan - IMPALA-10015 * We evict all in memory partitions when under memory pressure - this could be improved - IMPALA-10023. * The top-n node rebuilds an in-memory heap per partition during the output phase. This required less code but adds some avoidable overhead - see IMPALA-10025. Tests: * Planner tests - added tests that exercise the interesting code paths added in planning. - Predicate ordering in SELECT nodes changed in a couple of cases because some predicates were pushed into the inline views. * Modified SORT targeted perf tests to avoid conversion to Top-N * Added targeted perf test for partitioned top-n. * End-to-end tests - Unpartitioned Top-N end-to-end tests - Basic partitioning and duplicate handling tests on functional - Similar basic tests on larger inputs from TPC-DS and with larger partition counts. TODO: - Spilling because of large partitions - In-memory heap evictions This results in heap evictions - select * from ( select d_date, i_item_id, ss_list_price, rank() over (partition by d_date, ss_store_sk order by ss_list_price desc) rnk from store_sales ss join item i on ss_item_sk = i_item_sk join date_dim d on ss_sold_date_sk = d_date_sk where ss_list_price is not null) v where rnk = 500 order by d_date limit 50; Change-Id: Ic638af9495981d889a4cb7455a71e8be0eb1a8e5 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/exec-node.cc M be/src/exec/topn-node-ir.cc M be/src/exec/topn-node.cc M be/src/exec/topn-node.h M be/src/exprs/slot-ref.h M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/util/tuple-row-compare.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/AnalyticExpr.java M fe/src/main/java/org/apache/impala/analysis/AnalyticWindow.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M