Hello Tim Armstrong,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9239

to look at the new patch set (#2).

Change subject: IMPALA-6258: Uninitialized tuple pointers in row batch for 
empty rows
......................................................................

IMPALA-6258: Uninitialized tuple pointers in row batch for empty rows

Tuple pointers in the generated row batches may not be initialized
if a tuple has byte size 0. There are some codes which compare these
uninitialized pointers against nullptr so having them uninitialized
may return wrong (and non-deterministic) results, e.g.:
impala::TupleIsNullPredicate::GetBooleanVal()

The following query produces non-deterministic results currently:

SELECT count(v.x) FROM functional.alltypestiny t3 LEFT OUTER JOIN (
  SELECT true AS x FROM functional.alltypestiny t1 LEFT OUTER JOIN
  functional.alltypestiny t2 ON (true)) v ON (v.x = t3.bool_col)
WHERE t3.bool_col = true;

The alltypestiny table has 8 records, 4 records of them has the true
boolean value for bool_col. Therefore, the above query is a fancy
way of saying "8 * 8 * 4", i.e. it should return 256.

The solution is that scanners initialize tuple pointers to a non-null
value when there are no materialized slots. This non-null value is
provided by the new static method called Tuple::Poison().

I extended QueryTest/scanners.test with the above query. This test
executes the query against all table formats.

This change has the biggest performance impact on count(*) queries on
large kudu tables. For my quick benchmark I chose tpch_kudu.lineitem
which has 6001215 rows.

Without this patch 'select count(*) from tpch_kudu.lineitem' runs
around 0.15s. With the patch applied, it runs around 0.24s. I ran the
query on my desktop PC.

Without this patch:
ExecSummary:
Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. 
Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE        1  432.840us  432.840us      1           1   28.00 KB       
10.00 MB  FINALIZE
02:EXCHANGE         1   31.375ms   31.375ms      3           1          0       
       0  UNPARTITIONED
01:AGGREGATE        3    5.013ms    5.502ms      3           1   16.00 KB       
10.00 MB
00:SCAN KUDU        3   16.581ms   20.199ms  6.00M       6.00M  512.00 KB       
       0  tpch_kudu.lineitem

With this patch:
ExecSummary:
Operator       #Hosts   Avg Time   Max Time  #Rows  Est. #Rows   Peak Mem  Est. 
Peak Mem  Detail
-------------------------------------------------------------------------------------------------------------
03:AGGREGATE        1  472.898us  472.898us      1           1   28.00 KB       
10.00 MB  FINALIZE
02:EXCHANGE         1  122.700ms  122.700ms      3           1          0       
       0  UNPARTITIONED
01:AGGREGATE        3    6.104ms    6.151ms      3           1   16.00 KB       
10.00 MB
00:SCAN KUDU        3   90.856ms  107.409ms  6.00M       6.00M  512.00 KB       
       0  tpch_kudu.lineitem

Change-Id: I298122aaaa7e62eb5971508e0698e189519755de
---
M be/src/exec/hdfs-scanner.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M testdata/workloads/functional-query/queries/QueryTest/scanners.test
5 files changed, 30 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9239/2
--
To view, visit http://gerrit.cloudera.org:8080/9239
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I298122aaaa7e62eb5971508e0698e189519755de
Gerrit-Change-Number: 9239
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to