Re: scalability bottlenecks with (many) partitions (and more)

Tomas Vondra Wed, 04 Sep 2024 08:35:25 -0700

On 9/4/24 16:25, Matthias van de Meent wrote:
> On Tue, 3 Sept 2024 at 18:20, Tomas Vondra <to...@vondra.me> wrote:
>> FWIW the actual cost is somewhat higher, because we seem to need ~400B
>> for every lock (not just the 150B for the LOCK struct).
> 
> We do indeed allocate two PROCLOCKs for every LOCK, and allocate those
> inside dynahash tables. That amounts to (152+2*64+3*16=) 328 bytes in
> dynahash elements, and (3 * 8-16) = 24-48 bytes for the dynahash
> buckets/segments, resulting in 352-376 bytes * NLOCKENTS() being
> used[^1]. Does that align with your usage numbers, or are they
> significantly larger?
>


I see more like ~470B per lock. If I patch CalculateShmemSize to log the
shmem allocated, I get this:

  max_connections=100 max_locks_per_transaction=1000 => 194264001
  max_connections=100 max_locks_per_transaction=2000 => 241756967

and (((241756967-194264001)/100/1000)) = 474

Could be alignment of structs or something, not sure.

>> At least based on a quick experiment. (Seems a bit high, right?).
> 
> Yeah, that does seem high, thanks for nerd-sniping me.
> 
> The 152 bytes of LOCK are mostly due to a combination of two
> MAX_LOCKMODES-sized int[]s that are used to keep track of the number
> of requested/granted locks of each level. As MAX_LOCKMODES = 10, these
> arrays use a total of 2*4*10=80 bytes, with the remaining 72 spent on
> tracking. MAX_BACKENDS sadly doesn't fit in int16, so we'll have to
> keep using int[]s, but that doesn't mean we can't improve this size:
> 
> ISTM that MAX_LOCKMODES is 2 larger than it has to be: LOCKMODE=0 is
> NoLock, which is never used or counted in these shared structures, and
> the max lock mode supported by any of the supported lock methods is
> AccessExclusiveLock (8). We can thus reduce MAX_LOCKMODES to 8,
> reducing size of the LOCK struct by 16 bytes.
> 
> If some struct- and field packing is OK, then we could further reduce
> the size of LOCK by an additional 8 bytes by resizing the LOCKMASK
> type from int to int16 (we only use the first MaxLockMode (8) + 1
> bits), and then storing the grant/waitMask fields (now 4 bytes total)
> in the padding that's present at the end of the waitProcs struct. This
> would depend on dclist not writing in its padding space, but I
> couldn't find any user that did so, and most critically dclist_init
> doesn't scribble in the padding with memset.
> 
> If these are both implemented, it would save 24 bytes, reducing the
> struct to 128 bytes. :) [^2]
> 
> I also checked PROCLOCK: If it is worth further packing the struct, we
> should probably look at whether it's worth replacing the PGPROC* typed
> fields with ProcNumber -based ones, potentially in both PROCLOCK and
> PROCLOCKTAG. When combined with int16-typed LOCKMASKs, either one of
> these fields being replaced with ProcNumber would allow a reduction in
> size by one MAXALIGN quantum, reducing the struct to 56 bytes, the
> smallest I could get it to without ignoring type alignments.
> 
> Further shmem savings can be achieved by reducing the "10% safety
> margin" added at the end of LockShmemSize, as I'm fairly sure the
> memory used in shared hashmaps doesn't exceed the estimated amount,
> and if it did then we should probably fix that part, rather than
> requesting that (up to) 10% overhead here.
> 
> Alltogether that'd save 40 bytes/lock entry on size, and ~35
> bytes/lock on "safety margin", for a saving of (up to) 19% of our
> current allocation. I'm not sure if these tricks would benefit with
> performance or even be a demerit, apart from smaller structs usually
> being better at fitting better in CPU caches.
> 

Not sure either, but it seems worth exploring. If you do an experimental
patch for the LOCK size reduction, I can get some numbers.

I'm not sure about the safety margins. 10% sure seems like quite a bit
of memory (it might not have in the past, but as the instances are
growing, that probably changed).


regards

-- 
Tomas Vondra

Re: scalability bottlenecks with (many) partitions (and more)

Reply via email to