[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18243 ) Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/18243 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389 Gerrit-Change-Number: 18243 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 28 Feb 2022 04:30:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10049: Include RPC call id in slow RPC logs
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18243 ) Change subject: IMPALA-10049: Include RPC call_id in slow RPC logs .. Patch Set 6: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7886/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/18243 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7fb5746fa0be575745a8e168405d43115c425389 Gerrit-Change-Number: 18243 Gerrit-PatchSet: 6 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 28 Feb 2022 04:30:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 ) Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables .. Patch Set 21: I accidentally uploaded some unrelated files in Patch Set 20, which caused some other failures. Patch Set 21 fixes the flaky test failure. Please review the difference between Patch Set 19 and 21. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 21 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 27 Feb 2022 22:47:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 ) Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables .. Patch Set 21: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 21 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 27 Feb 2022 18:11:04 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 ) Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables .. Patch Set 21: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7885/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 21 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 27 Feb 2022 13:29:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 ) Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables .. Patch Set 21: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/10234/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 21 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Qifan Chen Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sun, 27 Feb 2022 12:55:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10898: Add runtime IN-list filters for ORC tables
Hello Qifan Chen, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/18141 to look at the new patch set (#21). Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables .. IMPALA-10898: Add runtime IN-list filters for ORC tables ORC files have optional bloom filter indexes for each column. Since ORC-1.7.0, the C++ reader supports pushing down predicates to skip unreleated RowGroups. The pushed down predicates will be evaludated on file indexes (i.e. statistics and bloom filter indexes). Note that only EQUALS and IN-list predicates can leverage bloom filter indexes. Currently Impala has two kinds of runtime filters: bloom filter and min-max filter. Unfortunately they can't be converted into EQUALS or IN-list predicates. So they can't leverage the file level bloom filter indexes. This patch adds runtime IN-list filters for this purpose. Currently they are generated for the build side of a broadcast join. They will only be applied on ORC tables and be pushed down to the ORC reader(i.e. ORC lib). To avoid exploding the IN-list, if # of distinct values of the build side exceeds a threshold (default to 1024), we set the filter to ALWAYS_TRUE and clear its entry. The threshold can be configured by a new query option, RUNTIME_IN_LIST_FILTER_ENTRY_LIMIT. Evaluating runtime IN-list filters is much slower than evaluating runtime bloom filters due to the current simple implementation (i.e. std::unorder_set) and the lack of codegen. So we disable it at row level. For visibility, this patch addes two counters in the HdfsScanNode: - NumPushedDownPredicates - NumPushedDownRuntimeFilters They reflect the predicates and runtime filters that are pushed down to the ORC reader. Currently, runtime IN-list filters are disabled by default. This patch extends the query option, ENABLED_RUNTIME_FILTER_TYPES, to support a comma separated list of filter types. It defaults to be "BLOOM,MIN_MAX". Add "IN_LIST" in it to enable runtime IN-list filters. Ran perf tests on a 3 instances cluster on my desktop using TPC-DS with scale factor 20. It shows significant improvements in some queries: +---+-+++-++++---++-++ | Workload | Query | File Format| Avg(s) | Base Avg(s) | Delta(Avg) | StdDev(%) | Base StdDev(%) | Iters | Median Diff(%) | MW Zval | Tval | +---+-+++-++++---++-++ | TPCDS(20) | TPCDS-Q67A | orc / snap / block | 35.07 | 44.01 | I -20.32% | 0.38%| 1.38%| 10| I -25.69% | -3.58 | -45.33 | | TPCDS(20) | TPCDS-Q37 | orc / snap / block | 1.08 | 1.45| I -25.23% | 7.14%| 3.09%| 10| I -34.09% | -3.58 | -12.94 | | TPCDS(20) | TPCDS-Q70A | orc / snap / block | 6.30 | 8.60| I -26.81% | 5.24%| 4.21%| 10| I -36.67% | -3.58 | -14.88 | | TPCDS(20) | TPCDS-Q16 | orc / snap / block | 1.33 | 1.85| I -28.28% | 4.98%| 5.92%| 10| I -39.38% | -3.58 | -12.93 | | TPCDS(20) | TPCDS-Q18A | orc / snap / block | 5.70 | 8.06| I -29.25% | 3.00%| 4.12%| 10| I -40.30% | -3.58 | -19.95 | | TPCDS(20) | TPCDS-Q22A | orc / snap / block | 2.01 | 2.97| I -32.21% | 6.12%| 5.94%| 10| I -47.68% | -3.58 | -14.05 | | TPCDS(20) | TPCDS-Q77A | orc / snap / block | 8.49 | 12.44 | I -31.75% | 6.44%| 3.96%| 10| I -49.71% | -3.58 | -16.97 | | TPCDS(20) | TPCDS-Q75 | orc / snap / block | 7.76 | 12.27 | I -36.76% | 5.01%| 3.87%| 10| I -59.56% | -3.58 | -23.26 | | TPCDS(20) | TPCDS-Q21 | orc / snap / block | 0.71 | 1.27| I -44.26% | 4.56%| 4.24%| 10| I -77.31% | -3.58 | -28.31 | | TPCDS(20) | TPCDS-Q80A | orc / snap / block | 9.24 | 20.42 | I -54.77% | 4.03%| 3.82%| 10| I -123.12% | -3.58 | -40.90 | | TPCDS(20) | TPCDS-Q39-1 | orc / snap / block | 1.07 | 2.26| I -52.74% | * 23.83% * | 2.60%| 10| I -149.68% | -3.58 | -14.43 | | TPCDS(20) | TPCDS-Q39-2 | orc / snap / block | 1.00 | 2.33| I -56.95% | * 19.53% * | 2.07%| 10| I -151.89% | -3.58 | -20.81 | +---+-+++-++++---++-++ "Base Avg" is the avg of the original time. "Avg" is the current time. However, we also see some regressions due to the suboptimal implementation. The follow-up JIRAs will focus