[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7749/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Shant Hovsepian Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 30 Nov 2020 02:14:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7748/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 1 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 Nov 2020 02:02:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@210 PS1, Line 210: if (estimatedTotalRows > 0 && limitValue > 0 > line too long (91 > 90) Done http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@211 PS1, Line 211: && limitValue <= estimatedTotalRows) { > line too long (94 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 Nov 2020 01:51:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/16792 to look at the new patch set (#2). Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint .. IMPALA-10360: Allow simple limit to be treated as sampling hint As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. Doing a sample instead of pure limit serves dual purposes: (a) it still helps with reducing the planning time since the scan ranges need be computed only for the sample files, (b) it allows sufficient number of files/rows to be read from the table such that after applying filter conditions or joins with another table, the query may still produce the N rows needed for limit. This fuctionality is especially useful if the query is against a view (note that TABLESAMPLE clause cannot be applied to a view). In this patch, a new table level hint, 'convert_limit_to_sample' is added. If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample. For example: set optimize_simple_limit = true; CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample] WHERE [always_true] ; SELECT * FROM v1 LIMIT 10; In this case, the limit 10 is converted to a sample of T and the sampling percent is the greater of 1% or ratio (in percent) of limit to the estimated row count of the table. Testing: - Added a alltypes_date_partition_2 table where the date and timestamp values match (this helps with setting the 'always_true' hint). - Added views with 'convert_limit_to_sample' and 'always_true' hints and added new tests against the views. Modified a few existing tests to reference the new table variant. - Added an end-to-end test. Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 9 files changed, 224 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/2 -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 2 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7747/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:14 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 ) Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@210 PS1, Line 210: if (estimatedTotalRows > 0 && limitValue > 0 && limitValue <= estimatedTotalRows) { line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16792/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@211 PS1, Line 211: long percent = Math.max(1, (long) ((double) limitValue / estimatedTotalRows * 100)); line too long (94 > 90) -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 1 Gerrit-Owner: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 30 Nov 2020 01:43:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable
Aman Sinha has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16792 Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable .. IMPALA-10360: Allow simple limit to be treated as sampling hint where applicable As a follow-up to IMPALA-10314, it is sometimes useful to consider a simple limit as a way to sample from a table if a relevant hint has been provided. Doing a sample instead of pure limit serves dual purposes: (a) it still helps with reducing the planning time since the scan ranges need be computed only for the sample files, (b) it allows sufficient number of files/rows to be read from the table such that after applying filter conditions or joins with another table, the query may still produce the N rows needed for limit. This fuctionality is especially useful if the query is against a view (note that TABLESAMPLE clause cannot be applied to a view). In this patch, a new table level hint, 'convert_limit_to_sample' is added. If this hint is attached to a table either in the main query block or within a view/subquery and simple limit optimization conditions are satisfied (according to IMPALA-10314), the limit is converted to a table sample. For example: set optimize_simple_limit = true; CREATE VIEW v1 as SELECT * FROM T [convert_limit_to_sample] WHERE [always_true] ; SELECT * FROM v1 LIMIT 10; In this case, the limit 10 is converted to a sample of T and the sampling percent is the greater of 1% or ratio (in percent) of limit to the estimated row count of the table. Testing: - Added a alltypes_date_partition_2 table where the date and timestamp values match (this helps with setting the 'always_true' hint). - Added views with 'convert_limit_to_sample' and 'always_true' hints. - Added new tests against the views. Modified a few existing tests to reference the new table variant. Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b --- M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java M fe/src/main/java/org/apache/impala/analysis/Expr.java M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java M fe/src/main/java/org/apache/impala/analysis/TableRef.java M fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M testdata/datasets/functional/functional_schema_template.sql M testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test M testdata/workloads/functional-query/queries/QueryTest/range-constant-propagation.test 9 files changed, 222 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/16792/1 -- To view, visit http://gerrit.cloudera.org:8080/16792 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b Gerrit-Change-Number: 16792 Gerrit-PatchSet: 1 Gerrit-Owner: Aman Sinha
[Impala-ASF-CR] IMPALA-10317: Add query option that limits huge joins at runtime
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16706 ) Change subject: IMPALA-10317: Add query option that limits huge joins at runtime .. IMPALA-10317: Add query option that limits huge joins at runtime This patch adds support for limiting the rows produced by Join node. The limit is specified by a query option. Queries exceed that limit get terminated. The checking runs periodically, so the actual rows produced may go somewhat over the limit. NUM_JOIN_ROWS_PRODUCED_LIMIT is exposed as an advanced query option. Rows produced Query profile is updated to include query wide and per backend metrics for RowsReturned. Example from "select count(*) from tpch_parquet.lineitem l1 cross join (select * from tpch_parquet.lineitem l2 limit 5) l3;": NESTED_LOOP_JOIN_NODE (id=2): - InactiveTotalTime: 107.534ms - PeakMemoryUsage: 16.00 KB (16384) - ProbeRows: 1.02K (1024) - ProbeTime: 0.000ns - RowsReturned: 10.00M (10002025) - RowsReturnedRate: 749.58 K/sec - TotalTime: 13s337ms Testing: Added tests for NUM_JOIN_ROWS_PRODUCED_LIMIT Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 --- M be/src/runtime/coordinator-backend-state.cc M be/src/runtime/coordinator.h M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/query-state.cc M be/src/service/impala-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M common/protobuf/control_service.proto M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift M common/thrift/generate_error_codes.py M testdata/workloads/functional-query/queries/QueryTest/query-resource-limits.test 13 files changed, 125 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/16706/3 -- To view, visit http://gerrit.cloudera.org:8080/16706 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbca7e053b61b4e31b066edcfb3b0398fa859d02 Gerrit-Change-Number: 16706 Gerrit-PatchSet: 3 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9355: TestExchangeMemUsage.test exchange mem usage scaling doesn't hit the memory limit
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16791 ) Change subject: IMPALA-9355: TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/7746/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id945d7e37fac07beb7808e6ccf8530e667cbaad4 Gerrit-Change-Number: 16791 Gerrit-PatchSet: 1 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sun, 29 Nov 2020 21:19:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9355: TestExchangeMemUsage.test exchange mem usage scaling doesn't hit the memory limit
Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16791 Change subject: IMPALA-9355: TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit .. IMPALA-9355: TestExchangeMemUsage.test_exchange_mem_usage_scaling doesn't hit the memory limit This patch reduces the memory limit for the following query in test_exchange_mem_usage_scaling test from 170MB to 164MB to reduce the chance of not detecting a memory allocation failure. set mem_limit= set num_scanner_threads=1; select * from tpch_parquet.lineitem l1 join tpch_parquet.lineitem l2 on l1.l_orderkey = l2.l_orderkey and l1.l_partkey = l2.l_partkey and l1.l_suppkey = l2.l_suppkey and l1.l_linenumber = l2.l_linenumber order by l1.l_orderkey desc, l1.l_partkey, l1.l_suppkey, l1.l_linenumber limit 5; In a test with 500 executions of the above query with the memory limit set to 164MB, there were 500 memory allocation failures in total (one in each execution), and a total of 266 of them from Exchange Node #4. Testing: Ran the query in question individually; Ran TestExchangeMemUsage.test_exchange_mem_usage_scaling test; Ran core tests. Change-Id: Id945d7e37fac07beb7808e6ccf8530e667cbaad4 --- M testdata/workloads/functional-query/queries/QueryTest/exchange-mem-scaling.test 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/16791/1 -- To view, visit http://gerrit.cloudera.org:8080/16791 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id945d7e37fac07beb7808e6ccf8530e667cbaad4 Gerrit-Change-Number: 16791 Gerrit-PatchSet: 1 Gerrit-Owner: Qifan Chen
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. Patch Set 19: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/7745/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Sun, 29 Nov 2020 16:38:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/16720 ) Change subject: IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate .. IMPALA-10325: Parquet scan should use min/max statistics to skip pages based on equi-join predicate This patch adds the logic to utilize min/max stats for Parquet row groups or pages to skip these entities when they don't qualify an equi-join predicate. A new class of predicates called overlap predicates is introduced to aid in the determination of whether a Parquet row group or a page overlap with a range computed from the hash join. If not, then the entire Parquet row group or the page are skipped. The new class of predicates co-exist with the existing min/max conjuncts that are introduced based on the local scan predicates. Both classes of predicates can work individually or together with each other. The overlap predicates are evaluated after the existing min/max conjuncts. TBD: 1. Unit/performance testing; 2. Core testing. Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 --- M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-column-stats.cc M be/src/exec/parquet/parquet-column-stats.h M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/scan-node.cc M be/src/runtime/coordinator.cc M be/src/util/min-max-filter.cc M be/src/util/min-max-filter.h M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java 16 files changed, 651 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/16720/19 -- To view, visit http://gerrit.cloudera.org:8080/16720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I379405ee75b14929df7d6b5d20dabc6f51375691 Gerrit-Change-Number: 16720 Gerrit-PatchSet: 19 Gerrit-Owner: Qifan Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy