Re: Spilling hashed SetOps and aggregates to disk

David Gershuni Thu, 07 Jun 2018 16:19:58 -0700

As Serge mentioned, we’ve implemented spill-to-disk for SetOps and Aggregates 
at Salesforce. We were hitting OOMs often enough that this became a high 
priority for us. However, our current spill implementation is based on dynahash 
from 9.6, and we’re not happy with its performance (it was primarily an OOM 
stop-gap and was not focused on optimizing performance).

Because of this, we’ve spec’d out a new spill-to-disk design for hash-based
aggregates (and later SetOps), and we plan to begin implementation very soon.
Since this is an important fix for the community as well, we would be happy
(and would prefer) to share our spill-to-disk implementation.

Our new spill approach is very similar to Jeff’s and Heikki’s and is designed
to use simplehash.h. It’s based on an algorithm called “HybridHash with
Pre-Partitioning” found in [1]. It may later make sense to implement the
“HashSort” Algorithm from [1] as well, which works better for highly skewed
grouping keys. The optimizer could eventually choose between the two based on
the stats available. We also like Heikki’s suggestions to use logtape.c to
reduce disk usage and a trie-based approach to control the size of partitions
dynamically.

We’ve also been grappling with how to handle the implementation challenges
pointed out in this thread. These include:
• tracking memory usage
• choosing a smart eviction policy (which is touched on in [2])
• serializing opaque user-defined transition values when eviction is
required

For 1), we plan to use our WithStats memory context, which Serge mentioned.
For 2), we plan to start with a simple non-eviction policy since we don’t have
the stats do anything smarter (i.e. insert until we fill the hash table, then
spill until we finish processing the batch, with evictions only happening if a
group’s transition value grows too large).
For 3), we don’t have a good solution yet. We could serialize/deserialize for
built-in types and rely on users to provide serialize/deserialize functions for
user-defined aggregates going forward.

But we’re open to suggestions :)

Regards,
David
Salesforce

[1] Revisiting Aggregation for Data Intensive Applications: A Performance Study
https://arxiv.org/pdf/1311.0059.pdf

[2] DB2 with BLU acceleration: so much more than just a column store
http://delivery.acm.org/10.1145/2540000/2536233/p1080-raman.pdf?ip=204.14.239.107&id=2536233&acc=ACTIVE%20SERVICE&key=37B0A9F49C26EEFC%2E37B0A9F49C26EEFC%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1528414374_aeb9f862ae2acc26db305d591095e3f7

Re: Spilling hashed SetOps and aggregates to disk

Reply via email to