[HACKERS] rethinking dense_alloc (HashJoin) as a memory context

Tomas Vondra Wed, 13 Jul 2016 07:54:41 -0700

Hi,

In the thread [1] dealing with hashjoin bug introduced in 9.5, Tomvoiced his dislike of dense_alloc. I kinda agree with him thatintroducing "local allocators" may not be the best idea, and asdense_alloc was introduced by me I was playing with the idea to wrapthis into a regular memory context, perhaps with some restrictions (e.g.no pfree). But I'm having trouble with that approach ...

Let me quickly explain the idea behind dense_alloc. When building thetuple hash table in hash join, we simply allocate large chunk of memoryusing palloc (~32kB), and then store the tuples into the chunk on ourown without calling palloc for each tuple. Each tuple already has lengthin the header, so we don't need chunk header. Also, we don't do the 2^kchunk sizes and instead store the tuples densely.

This means we can't do repalloc or pfree on the tuples, but fine. Wenever did repalloc in hashjoin anyway, and pfree is only needed whenincreasing the number of batches. But with dense_alloc we can simplywalk through the tuples as stored in the allocated chunks, which has thenice benefit that it's sequential, making memory prefetching moreefficient than with the old code (walking through buckets). Also, nofreelists and such.


So the dense_alloc has several benefits:

(a) memory reduction thanks to eliminating StandardChunkHeader (which is16B, and quite noticeable for narrow tuples)

(b) memory reduction thanks to dense packing tuples (not leaving freespace in each chunk)

(c) improving efficiency by sequential memory accesses (compared torandom accesses caused by access through buckets)

Per the measurements done in thread [2], (a) and (b) may reduce memoryrequirements by 50% in some cases. I also vaguely remember doingbenchmarks for (c) and seeing measurable improvements, but I don't seethe numbers in the thread, so either it was posted somewhere else or notat all :-/

Anyway, I'm explaining this because I think it's important the newreworked code achieves the same benefits. But when trying to implementit as a special memory context, I quickly ran into the requirement thateach chunk has a chunk header [3] which would prevent (a).

I know it was proposed to only include the chunk header when compiledwith asserts, but I don't like the idea of having a reusable code thatdepends on that (and fails with a segfault without it).

If I have to choose between a memory context that is essentially meantto be reused, but likely to fail unexpectedly in non-assert builds, anda special local allocator isolated to a single node, I choose thelatter. Perhaps I'd see this differently had there been other placesthat could use the new memory context, but I can't think of one.


Opinions?

[1]https://www.postgresql.org/message-id/flat/7034.1454722453%40sss.pgh.pa.us


[2] https://www.postgresql.org/message-id/flat/53B4A03F.3070409%40fuzzy.cz

[3]https://github.com/postgres/postgres/blob/master/src/include/utils/memutils.h#L49


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] rethinking dense_alloc (HashJoin) as a memory context

Reply via email to