Hello Thomas Tauber-Marshall, Tim Armstrong, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/9239 to look at the new patch set (#3). Change subject: IMPALA-6258: Uninitialized tuple pointers in row batch for empty rows ...................................................................... IMPALA-6258: Uninitialized tuple pointers in row batch for empty rows Tuple pointers in the generated row batches may not be initialized if a tuple has byte size 0. There are some codes which compare these uninitialized pointers against nullptr so having them uninitialized may return wrong (and non-deterministic) results, e.g.: impala::TupleIsNullPredicate::GetBooleanVal() The following query produces non-deterministic results currently: SELECT count(v.x) FROM functional.alltypestiny t3 LEFT OUTER JOIN ( SELECT true AS x FROM functional.alltypestiny t1 LEFT OUTER JOIN functional.alltypestiny t2 ON (true)) v ON (v.x = t3.bool_col) WHERE t3.bool_col = true; The alltypestiny table has 8 records, 4 records of them has the true boolean value for bool_col. Therefore, the above query is a fancy way of saying "8 * 8 * 4", i.e. it should return 256. The solution is that scanners initialize tuple pointers to a non-null value when there are no materialized slots. This non-null value is provided by the new static method called Tuple::Poison(). I extended QueryTest/scanners.test with the above query. This test executes the query against all table formats. This change has the biggest performance impact on count(*) queries on large kudu tables. For my quick benchmark I copied tpch_kudu.lineitem and doubled its data. The resulting table has 12,002,430 rows. Without this patch 'select count(*) from biglineitem' runs for ~0.12s. With the patch applied, the overhead is around a dozens of ms. I measured the query on my desktop PC using a relase build of Impala. On debug builds, the execution time of the patched version is around 160% of the original version. Without this patch: +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ | 03:AGGREGATE | 1 | 127.50us | 127.50us | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE | | 02:EXCHANGE | 1 | 22.32ms | 22.32ms | 3 | 1 | 0 B | 0 B | UNPARTITIONED | | 01:AGGREGATE | 3 | 1.78ms | 1.89ms | 3 | 1 | 16.00 KB | 10.00 MB | | | 00:SCAN KUDU | 3 | 8.00ms | 8.28ms | 12.00M | -1 | 512.00 KB | 0 B | default.biglineitem | +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ With this patch: +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ | 03:AGGREGATE | 1 | 129.01us | 129.01us | 1 | 1 | 28.00 KB | 10.00 MB | FINALIZE | | 02:EXCHANGE | 1 | 33.00ms | 33.00ms | 3 | 1 | 0 B | 0 B | UNPARTITIONED | | 01:AGGREGATE | 3 | 1.99ms | 2.13ms | 3 | 1 | 16.00 KB | 10.00 MB | | | 00:SCAN KUDU | 3 | 13.13ms | 13.97ms | 12.00M | -1 | 512.00 KB | 0 B | default.biglineitem | +--------------+--------+----------+----------+--------+------------+-----------+---------------+---------------------+ Change-Id: I298122aaaa7e62eb5971508e0698e189519755de --- M be/src/exec/hdfs-scanner.cc M be/src/exec/kudu-scanner.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M testdata/workloads/functional-query/queries/QueryTest/scanners.test 5 files changed, 30 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9239/3 -- To view, visit http://gerrit.cloudera.org:8080/9239 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I298122aaaa7e62eb5971508e0698e189519755de Gerrit-Change-Number: 9239 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>