[Impala-ASF-CR] IMPALA-12373: Small String Optimization for StringValue

Zoltan Borok-Nagy (Code Review) Tue, 24 Oct 2023 04:42:32 -0700

Hello Daniel Becker, Csaba Ringhofer, Michael Smith, Impala Public Jenkins,


I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20496

to look at the new patch set (#8).

Change subject: IMPALA-12373: Small String Optimization for StringValue
......................................................................

IMPALA-12373: Small String Optimization for StringValue

This patch implements the Small String Optimization (SSO) for
StringValue objects. This is a well-known optimization in the C++
world that is used by the majority of various string implementations
(STL string, boost string, Folly string, etc.)

The old layout of the StringValue was:
  char* ptr;  // 8 bytes
  int len;    // 4 bytes

We also add the __packed__ attribute to the StringValue class which
means there is no padding between 'ptr' and 'len', neither after 'len'.
I.e. StringValue objects take 12 bytes. This means with SSO we can use
11 bytes to store small strings. In this case the last byte is used to
store the length.

Small string layout:
  char[11] buf;
  unsigned char len;

We also need an indicator bit (which tells whether the long
representation or the small representation is active) in the last byte
that is the same bit of LONG_STRING.len and SMALL_STRING.len. On
little-endian architectures this is the most significant bit (MSB) of
both LONG_STRING.len and SMALL_STRING.len. On big endian architectures
this would be the least significant bit (LSB) of both LONG_STRING.len
and SMALL_STRING.len.

This patch adds SmallableString which implements the above on an
on-demand basis. I.e. all string objects start with the long
representation, then the string object can be explicitly asked to try
smallify itself. This is because I didn't want to introduce too much
change in behavior. This way we can try smallify only at certain points
(e.g. DeepCopy()), and we can also smallify all strings in a tuple at
once. The latter means if we've done that for a tuple, subsequent
smallifications can return on the first small string that is encountered
(because we can assume that all other string slots are also smallified).

Benefits:
 * lower memory and CPU cache consumption
 * smaller serialization buffers to compress
 * less data to send over the network
 * less data to spill to disk

Measurements:
I used TPCH(30) with the following query:

  select * from lineitem a, lineitem b
  where a.l_orderkey = b.l_orderkey and
        a.l_orderkey * b.l_orderkey < 1000

The above query generates significant network traffic and does spilling.
The query selects all 16 columns out of which 6.5 columns contain small
strings. I.e. this kind of data is a good candidate for this
optimization but also not unrealistic.

This improves the following numbers:
Total query time:             5m16s    --> 3m40s
Total CPU time:               12m9s    --> 10m17s
Bytes sent over the network:  54.17 GB --> 41.76 GB
Data had to be spilled:       14.66 GB --> 9.42 GB

On the standard benchmarks I measured the followings:

TPC-H:
+----------+-----------------------+---------+------------+------------+----------------+
| Workload | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+----------+-----------------------+---------+------------+------------+----------------+
| TPCH(42) | parquet / none / none | 3.57    | -3.06%     | 2.40       | -1.34% 
        |
+----------+-----------------------+---------+------------+------------+----------------+

TPC-DS:
+-----------+-----------------------+---------+------------+------------+----------------+
| Workload  | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+-----------+-----------------------+---------+------------+------------+----------------+
| TPCDS(30) | parquet / none / none | 2.09    | -0.92%     | 0.76       | 
-1.17%         |
+-----------+-----------------------+---------+------------+------------+----------------+

TODO:
* add specific BE and E2E tests for small strings
** also for long strings
* add ifdefs for big-endian
* update estimations in a follow-up Jira

Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
---
M be/src/benchmarks/atod-benchmark.cc
M be/src/benchmarks/atof-benchmark.cc
M be/src/benchmarks/atoi-benchmark.cc
M be/src/benchmarks/hash-benchmark.cc
M be/src/benchmarks/parse-timestamp-benchmark.cc
M be/src/benchmarks/string-benchmark.cc
M be/src/benchmarks/string-search-benchmark.cc
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/exec/analytic-eval-node.cc
M be/src/exec/avro/hdfs-avro-scanner-ir.cc
M be/src/exec/avro/hdfs-avro-scanner-test.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/file-metadata-utils.cc
M be/src/exec/grouping-aggregator.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/kudu/kudu-scanner.cc
M be/src/exec/kudu/kudu-util-ir.cc
M be/src/exec/kudu/kudu-util.cc
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/orc/orc-column-readers.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/parquet-bloom-filter-util.cc
M be/src/exec/parquet/parquet-column-stats.cc
M be/src/exec/parquet/parquet-column-stats.inline.h
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-data-converter.h
M be/src/exec/parquet/parquet-plain-test.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.h
M be/src/exec/text-converter.inline.h
M be/src/experiments/data-provider.cc
M be/src/experiments/string-search-sse.h
M be/src/exprs/expr-value.h
M be/src/exprs/like-predicate.cc
M be/src/exprs/literal.cc
M be/src/exprs/operators-ir.cc
M be/src/exprs/scalar-expr-evaluator-ir.cc
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/slot-ref.cc
M be/src/exprs/string-functions-ir.cc
M be/src/exprs/timestamp-functions.cc
M be/src/exprs/timestamp-functions.h
M be/src/runtime/CMakeLists.txt
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/buffered-tuple-stream.cc
M be/src/runtime/collection-value.h
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/raw-value.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/row-batch-serialize-test.cc
A be/src/runtime/smallable-string.h
M be/src/runtime/sorter.cc
M be/src/runtime/string-buffer.h
M be/src/runtime/string-search.h
A be/src/runtime/string-value-ir.cc
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/string-value.inline.h
M be/src/runtime/tuple-ir.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/fe-support.cc
M be/src/service/hs2-util.cc
M be/src/util/dict-encoding.h
M be/src/util/in-list-filter.cc
M be/src/util/in-list-filter.h
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M be/src/util/static-asserts.cc
M be/src/util/tuple-row-compare.cc
M be/src/util/url-parser.cc
M 
testdata/workloads/functional-query/queries/QueryTest/analytic-fns-tpcds-partitioned-topn.test
81 files changed, 1,026 insertions(+), 466 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/96/20496/8
--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Smith <michael.sm...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

[Impala-ASF-CR] IMPALA-12373: Small String Optimization for StringValue

Reply via email to