On 2014-11-17 21:03:07 +0100, Tomas Vondra wrote:
> On 17.11.2014 19:46, Andres Freund wrote:
> > On 2014-11-17 19:42:25 +0100, Tomas Vondra wrote:
> >> On 17.11.2014 18:04, Andres Freund wrote:
> >>> Hi,
> >>>
> >>> On 2014-11-16 23:31:51 -0800, Jeff Davis wrote:
> >>>> *** a/src/include/nodes/memnodes.h
> >>>> --- b/src/include/nodes/memnodes.h
> >>>> ***************
> >>>> *** 60,65 **** typedef struct MemoryContextData
> >>>> --- 60,66 ----
> >>>>          MemoryContext nextchild;        /* next child of same parent */
> >>>>          char       *name;                       /* context name (just 
> >>>> for debugging) */
> >>>>          bool            isReset;                /* T = no space alloced 
> >>>> since last reset */
> >>>> +        uint64          mem_allocated;  /* track memory allocated for 
> >>>> this context */
> >>>>   #ifdef USE_ASSERT_CHECKING
> >>>>          bool            allowInCritSection;     /* allow palloc in 
> >>>> critical section */
> >>>>   #endif
> >>>
> >>> That's quite possibly one culprit for the slowdown. Right now one 
> >>> AllocSetContext struct fits precisely into three cachelines. After
> >>> your change it doesn't anymore.
> >>
> >> I'm no PPC64 expert, but I thought the cache lines are 128 B on that
> >> platform, since at least Power6?
> > 
> > Hm, might be true.
> > 
> >> Also, if I'm counting right, the MemoryContextData structure is 56B
> >> without the 'mem_allocated' field (and without the allowInCritSection),
> >> and 64B with it (at that particular place). sizeof() seems to confirm
> >> that. (But I'm on x86, so maybe the alignment on PPC64 is different?).
> > 
> > The MemoryContextData struct is embedded into AllocSetContext.
> 
> Oh, right. That makes is slightly more complicated, though, because
> AllocSetContext adds 6 x 8B fields plus an in-line array of freelist
> pointers. Which is 11x8 bytes. So in total 56+56+88=200B, without the
> additional field. There might be some difference because of alignment,
> but I still don't see how that one additional field might impact cachelines?

It's actually 196 bytes:

struct AllocSetContext {
        MemoryContextData          header;               /*     0    56 */
        AllocBlock                 blocks;               /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        AllocChunk                 freelist[11];         /*    64    88 */
        /* --- cacheline 2 boundary (128 bytes) was 24 bytes ago --- */
        Size                       initBlockSize;        /*   152     8 */
        Size                       maxBlockSize;         /*   160     8 */
        Size                       nextBlockSize;        /*   168     8 */
        Size                       allocChunkLimit;      /*   176     8 */
        AllocBlock                 keeper;               /*   184     8 */
        /* --- cacheline 3 boundary (192 bytes) --- */

        /* size: 192, cachelines: 3, members: 8 */
};

And thus one additional field tipps it over the edge.

"pahole" is a very useful utility.

> But if we separated the freelist, that might actually make it faster, at
> least for calls that don't touch the freelist at all, no? Because most
> of the palloc() calls will be handled from the current block.

I seriously doubt it. The additional indirection + additional branches
are likely to make it worse.

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to