Hi,

I spent some time performance testing sorting, and spotted a funny phenomenon with very small work_mem settings. This example demonstrates it well:

I sorted about 1 GB worth of pre-sorted integers, with different settings of work_mem, between 64 kB (the minimum), and 100 kB. I also added some logging to print out the number of "runs" performed:


work_mem   elapsed time   runs
--------   ------------   ----
   64 kB       20167 ms   35301
   65 kB       19769 ms   34454
   66 kB       19844 ms   33646
   67 kB       20318 ms   32840
   68 kB       20022 ms   32105
   69 kB       19455 ms   31403
   70 kB       19509 ms   30700
   71 kB       19200 ms   30057
   72 kB       19248 ms   29441
   73 kB       19809 ms   29071
   74 kB       19840 ms   28657
   75 kB       19864 ms   28281
   76 kB       19634 ms   27914
   77 kB       19599 ms   27557
   78 kB       19429 ms   27184
   79 kB       19425 ms   26845
   80 kB       20196 ms   26515
   81 kB       21781 ms   26192
   82 kB       18883 ms   25877
   83 kB       20460 ms   25548
   84 kB       18910 ms   25249
   85 kB       29245 ms   2009692
   86 kB       26532 ms   1039496
   87 kB       25126 ms   701056
   88 kB       25241 ms   528867
   89 kB       24235 ms   418686
   90 kB       24038 ms   350528
   91 kB       24803 ms   301454
   92 kB       23192 ms   264434
   93 kB       23111 ms   233686
   94 kB       23295 ms   210807
   95 kB       26602 ms   192009
   96 kB       22990 ms   176289
   97 kB       22700 ms   162948
   98 kB       22370 ms   150727
   99 kB       22686 ms   140867
  100 kB       22413 ms   132217

Huh, something funny happened going from 84 kB to 85 kB. I traced it to this piece of code in inittapes().

        /*
         * Decrease availMem to reflect the space needed for tape buffers; but
         * don't decrease it to the point that we have no room for tuples. (That
         * case is only likely to occur if sorting pass-by-value Datums; in all
         * other scenarios the memtuples[] array is unlikely to occupy more than
         * half of allowedMem.  In the pass-by-value case it's not important to
         * account for tuple space, so we don't care if LACKMEM becomes
         * inaccurate.)
         */
        tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD;

        if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < 
state->allowedMem)
                USEMEM(state, tapeSpace);

With a small work_mem values, maxTapes is always 6, so tapeSpace is 48 kB. With a small enough work_mem, 84 kB or below in this test case, there is not enough memory left at this point, so we don't subtract tapeSpace. However, with a suitably evil test case, you can get arbitrary close to the edge, so that we will subtract away almost all the remaining memory above, leaving only a few bytes for the tuples. In this example case, with work_mem='85 kB', each quicksorted run consists of only 15 tuples on average.

To fix, I propose that we change the above so that we always subtract tapeSpace, but if there is less than e.g. 32 kB of memory left after that (including, if it went below 0), then we bump availMem back up to 32 kB. So we'd always reserve 32 kB to hold the tuples, even if that means that we exceed 'work_mem' slightly.

Memory accounting isn't very accurate in general. Even as the code stands, we will exceed the requested work_mem if it's small enough, and we don't account for every small allocation and overhead anyway. So I think that would be acceptable, even though it's kind of cheating.

Of course, sorting large sets with tiny work_mem settings isn't a very good idea to begin with. But I'd nevertheless like to avoid non-intuitive performance cliffs like this. It threw me off while performance testing another patch.

- Heikki

Reply via email to