Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17860 )
Change subject: IMPALA-9873: Avoid materialization of columns for filtered out rows in Parquet table. ...................................................................... Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/17860/12//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17860/12//COMMIT_MSG@24 PS12, Line 24: TPCH scale 42 > I think it would be good to execute the whole benchmark with bin/single_nod Hi Zoltan, Sorry for the delay with benchmark. I ran the entire tpch bechmark at scale 42. This was the summary of report (Delta is the change). Report Generated on 2021-10-28 Run Description: "78ce235db6d5b720f3e3319ff571a2da054a2602 vs c46d765dccd5739c848d8c1c82043e72394b8397" Cluster Name: UNKNOWN Lab Run Info: UNKNOWN Impala Version: impalad version 4.1.0-SNAPSHOT RELEASE (2021-10-28) Baseline Impala Version: impalad version 4.1.0-SNAPSHOT RELEASE (2021-10-27) +----------+-----------------------+---------+------------+------------+----------------+ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +----------+-----------------------+---------+------------+------------+----------------+ | TPCH(42) | parquet / none / none | 12.83 | -1.54% | 8.26 | -1.48% | +----------+-----------------------+---------+------------+------------+----------------+ Very slight improvement overall and major improvements in these 2 queries: (I) Improvement: TPCH(42) TPCH-Q6 [parquet / none / none] (1.85s -> 1.72s [-7.30%]) +--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+ | Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows | +--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+ | 00:SCAN HDFS | 94.83% | 1.50s | 1.62s | -7.75% | 2.07% | 1.56s | 1.73s | -9.58% | 1 | 1 | 4.79M | 29.96M | +--------------+------------+-------+----------+------------+-----------+-------+----------+------------+--------+-------+-------+-----------+ (I) Improvement: TPCH(42) TPCH-Q19 [parquet / none / none] (4.73s -> 4.18s [-11.72%]) +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ | Operator | % of Query | Avg | Base Avg | Delta(Avg) | StdDev(%) | Max | Base Max | Delta(Max) | #Hosts | #Inst | #Rows | Est #Rows | +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ | 01:SCAN HDFS | 22.68% | 729.91ms | 736.69ms | -0.92% | 1.61% | 751.55ms | 747.34ms | +0.56% | 1 | 1 | 20.33K | 1.50M | | 00:SCAN HDFS | 74.84% | 2.41s | 2.97s | -18.98% | 0.67% | 2.44s | 3.00s | -18.70% | 1 | 1 | 13.07K | 29.96M | +--------------+------------+----------+----------+------------+-----------+----------+----------+------------+--------+-------+--------+-----------+ There was no regression reported as such just these 2 improvements and couple of queries with high variability in runtime (not related to our change). -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 19 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 01 Nov 2021 17:51:22 +0000 Gerrit-HasComments: Yes
