Batch splitting shouldn't be followed by a hash function change?

On Mon, Apr 22, 2019, 05:09 Tomas Vondra <tomas.von...@2ndquadrant.com>
wrote:

> On Sun, Apr 21, 2019 at 11:40:22AM -0500, Justin Pryzby wrote:
> >On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote:
> >> Jeff Janes <jeff.ja...@gmail.com> writes:
> >> > The growEnabled stuff only prevents infinite loops.  It doesn't
> prevent
> >> > extreme silliness.
> >>
> >> > If a single 32 bit hash value has enough tuples by itself to not fit
> in
> >> > work_mem, then it will keep splitting until that value is in a batch
> by
> >> > itself before shutting off
> >>
> >> I suspect, however, that we might be better off just taking the
> existence
> >> of the I/O buffers into account somehow while deciding whether it's
> worth
> >> growing further.  That is, I'm imagining adding a second independent
> >> reason for shutting off growEnabled, along the lines of "increasing
> >> nbatch any further will require an unreasonable amount of buffer
> memory".
> >> The question then becomes how to define "unreasonable".
> >
> >On Sun, Apr 21, 2019 at 06:15:25PM +0200, Tomas Vondra wrote:
> >> I think the question the code needs to be asking is "If we double the
> >> number of batches, does the amount of memory we need drop?" And the
> >> memory needs to account both for the buffers and per-batch data.
> >>
> >> I don't think we can just stop increasing the number of batches when the
> >> memory for BufFile exceeds work_mem, because that entirely ignores the
> >> fact that by doing that we force the system to keep the per-batch stuff
> >> in memory (and that can be almost arbitrary amount).
> >...
> >> Of course, this just stops enforcing work_mem at some point, but it at
> >> least attempts to minimize the amount of memory used.
> >
> >This patch defines reasonable as "additional BatchFiles will not
> themselves
> >exceed work_mem; OR, exceeded work_mem already but additional BatchFiles
> are
> >going to save us RAM"...
> >
>
> OK.
>
> >I think the first condition is insensitive and not too important to get
> right,
> >it only allows work_mem to be exceeded by 2x, which maybe already happens
> for
> >multiple reasons, related to this thread and otherwise.  It'd be fine to
> slap
> >on a factor of /2 or /4 or /8 there too.
> >
>
> TBH I'm not quite sure I understand all the conditions in the patch - it
> seems unnecessarily complicated. And I don't think it actually minimizes
> the amount of memory used for hash table + buffers, because it keeps the
> same spaceAllowed (which pushes nbatches up). At some point it actually
> makes to bump spaceAllowed and make larger batches instead of adding
> more batches, and the patch does not seem to do that.
>
> Also, the patch does this:
>
>     if (hashtable->nbatch*sizeof(PGAlignedBlock) < hashtable->spaceAllowed)
>     {
>         ExecHashIncreaseNumBatches(hashtable);
>     }
>     else if (hashtable->spaceUsed/2 >= hashtable->spaceAllowed)
>     {
>         /* Exceeded spaceAllowed by 2x, so we'll save RAM by allowing
> nbatches to increase */
>         /* I think this branch would be hit almost same as below branch */
>         ExecHashIncreaseNumBatches(hashtable);
>     }
>     ...
>
> but the reasoning for the second branch seems wrong, because
>
>     (spaceUsed/2 >= spaceAllowed)
>
> is not enough to guarantee that we actually save memory by doubling the
> number of batches. To do that, we need to make sure that
>
>    (spaceUsed/2 >= hashtable->nbatch * sizeof(PGAlignedBlock))
>
> But that may not be true - it certainly is not guaranteed by not getting
> into the first branch.
>
> Consider an ideal example with uniform distribution:
>
>     create table small (id bigint, val text);
>     create table large (id bigint, val text);
>
>     insert into large select 1000000000 * random(), md5(i::text)
>     from generate_series(1, 700000000) s(i);
>
>     insert into small select 1000000000 * random(), md5(i::text)
>     from generate_series(1, 10000) s(i);
>
>     vacuum analyze large;
>     vacuum analyze small;
>
>     update pg_class set (relpages, reltuples) = (1000000, 1)
>     where relname = 'large';
>
>     update pg_class set (relpages, reltuples) = (1, 1000000)
>     where relname = 'small';
>
>     set work_mem = '1MB';
>
>     explain analyze select * from small join large using (id);
>
> A log after each call to ExecHashIncreaseNumBatches says this:
>
>     nbatch=2 spaceUsed=463200 spaceAllowed=1048576 BufFile=16384
>     nbatch=4 spaceUsed=463120 spaceAllowed=1048576 BufFile=32768
>     nbatch=8 spaceUsed=457120 spaceAllowed=1048576 BufFile=65536
>     nbatch=16 spaceUsed=458320 spaceAllowed=1048576 BufFile=131072
>     nbatch=32 spaceUsed=457120 spaceAllowed=1048576 BufFile=262144
>     nbatch=64 spaceUsed=459200 spaceAllowed=1048576 BufFile=524288
>     nbatch=128 spaceUsed=455600 spaceAllowed=1048576 BufFile=1048576
>     nbatch=256 spaceUsed=525120 spaceAllowed=1048576 BufFile=2097152
>     nbatch=256 spaceUsed=2097200 spaceAllowed=1048576 BufFile=2097152
>     nbatch=512 spaceUsed=2097200 spaceAllowed=1048576 BufFile=4194304
>     nbatch=1024 spaceUsed=2097200 spaceAllowed=1048576 BufFile=8388608
>     nbatch=2048 spaceUsed=2097200 spaceAllowed=1048576 BufFile=16777216
>     nbatch=4096 spaceUsed=2097200 spaceAllowed=1048576 BufFile=33554432
>     nbatch=8192 spaceUsed=2097200 spaceAllowed=1048576 BufFile=67108864
>     nbatch=16384 spaceUsed=2097200 spaceAllowed=1048576 BufFile=134217728
>
> So we've succeeded in keeping spaceUsed below 2*spaceAllowed (which
> seems rather confusing, BTW), but we've allocated 128MB for BufFile. So
> about 130MB in total. With 16k batches.
>
> What I think might work better is the attached v2 of the patch, with a
> single top-level condition, comparing the combined memory usage
> (spaceUsed + BufFile) against spaceAllowed. But it also tweaks
> spaceAllowed once the size needed for BufFile gets over work_mem/3.
>
> And it behaves like this:
>
>     nbatch=2 spaceUsed=458640 spaceAllowed=1048576 BufFile=16384
>     nbatch=4 spaceUsed=455040 spaceAllowed=1048576 BufFile=32768
>     nbatch=8 spaceUsed=440160 spaceAllowed=1048576 BufFile=65536
>     nbatch=16 spaceUsed=426560 spaceAllowed=1048576 BufFile=131072
>     nbatch=32 spaceUsed=393200 spaceAllowed=1048576 BufFile=262144
>     nbatch=64 spaceUsed=329120 spaceAllowed=1572864 BufFile=524288
>     nbatch=128 spaceUsed=455600 spaceAllowed=3145728 BufFile=1048576
>     nbatch=256 spaceUsed=987440 spaceAllowed=6291456 BufFile=2097152
>     nbatch=512 spaceUsed=2040560 spaceAllowed=12582912 BufFile=4194304
>     nbatch=1024 spaceUsed=4114640 spaceAllowed=25165824 BufFile=8388608
>     nbatch=2048 spaceUsed=8302880 spaceAllowed=50331648 BufFile=16777216
>
> So we end up with just 2k batches, using ~24MB of memory in total.
> That's because the spaceAllowed limit was bumped up instead of adding
> more and more batches.
>
> >The current patch doesn't unset growEnabled, since there's no point at
> which
> >the hashtable should grow without bound: if hash tables are *already*
> exceeding
> >work_mem by 2x as big, nbatches should be doubled.
> >
>
> Not sure. I guess it might be useful to re-evaluate the flag after a
> while - not necessarily by actually enabling it right away, but just
> checking if it'd move any tuples. Just disabling it once may be an issue
> when the input data is somehow correlated, which seems to be one of the
> issues with the data set discussed in this thread.
>
> regards
>
> --
> Tomas Vondra                  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Reply via email to