Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

Andres Freund Thu, 20 Dec 2012 03:33:34 -0800

On 2012-12-20 02:54:48 +0100, Tomas Vondra wrote:
> On 19.12.2012 02:18, Andres Freund wrote:
> > On 2012-12-17 00:31:00 +0100, Tomas Vondra wrote:
> >
> > I think except of the temp buffer issue mentioned below its ready.
> >
> >> -DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
> >> +DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes)
> >>  {
> >> -  int                     i;
> >> +  int         i, j;
> >> +
> >> +  /* sort the list of rnodes */
> >> +  pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
> >>
> >>    /* If it's a local relation, it's localbuf.c's problem. */
> >> -  if (RelFileNodeBackendIsTemp(rnode))
> >> +  for (i = 0; i < nnodes; i++)
> >>    {
> >> -          if (rnode.backend == MyBackendId)
> >> -                  DropRelFileNodeAllLocalBuffers(rnode.node);
> >> -          return;
> >> +          if (RelFileNodeBackendIsTemp(rnodes[i]))
> >> +          {
> >> +                  if (rnodes[i].backend == MyBackendId)
> >> +                          DropRelFileNodeAllLocalBuffers(rnodes[i].node);
> >> +          }
> >>    }
> >
> > While you deal with local buffers here you don't anymore in the big loop
> > over shared buffers. That wasn't needed earlier since we just returned
> > after noticing we have local relation, but thats not the case anymore.
>
> Hmm, but that would require us to handle the temp relations explicitly
> wherever we call DropRelFileNodeAllBuffers. Currently there are two such
> places - smgrdounlink() and smgrdounlinkall().
>
> By placing it into DropRelFileNodeAllBuffers() this code is shared and I
> think it's a good thing.
>
> But that does not mean the code is perfect - it was based on the
> assumption that if there's a mix of temp and regular relations, the temp
> relations will be handled in the first part and the rest in the second one.
>
> Maybe it'd be better to improve it so that the temp relations are
> removed from the array after the first part (and skip the second one if
> there are no remaining relations).


The problem is that afaik without the backend-local part a temporary
relation could match the same relfilenode as a "full" relation which
would have pretty bad consequences.

> > Still surprised this is supposed to be faster than a memcmp, but as you
> > seem to have measured it earlier..
>
> It surprised me too. These are the numbers with the current patch:
>
> 1) one by one
> =============
>           0    2    4    6    8    10   12   14   16   18   20
> --------------------------------------------------------------
> current  15   22   28   34   41    75   77   82   92   99  106
> memcmp   16   23   29   36   44   122  125  128  153  154  158
>
> Until the number of indexes reaches ~10, the numbers are almost exactly
> the same. Then the bsearch branch kicks in and it's clear how much
> slower the memcmp comparator is.
>
> 2) batches of 100
> =================
>           0    2    4    6    8    10   12   14   16   18   20
> --------------------------------------------------------------
> current   3    5    8   10   12    15   17   21   23   27   31
> memcmp    4    7   10   13   16    19   22   28   30   32   36
>
> Here the difference is much smaller, but even here the memcmp is
> consistently a bit slower.
>
>
> My theory is that while the current comparator starts with the most
> variable part (relation OID), and thus usually starts after the
> comparing the first 4B, the memcmp starts from the other end (tablespace
> and database) and therefore needs to compare more data.

Thats a good guess. I think its ok this way, but if you feel like
playing you could just change the order in the struct...

> >> +void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
> >> +{
> >> +  int i = 0;
> >> +  RelFileNodeBackend *rnodes;
> >> +  ForkNumber  forknum;
> >> +
> >> +  /* initialize an array which contains all relations to be dropped */
> >> +  rnodes = palloc(sizeof(RelFileNodeBackend) * nrels);
> >> +  for (i = 0; i < nrels; i++)
> >> +  {
> >> +          RelFileNodeBackend rnode = rels[i]->smgr_rnode;
> >> +          int                     which = rels[i]->smgr_which;
> >> +
> >> +          rnodes[i] = rnode;
> >> +
> >> +          /* Close the forks at smgr level */
> >> +          for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
> >> +                  (*(smgrsw[which].smgr_close)) (rels[i], forknum);
> >> +  }
> >> +
> >> +  /*
> >> +   * Get rid of any remaining buffers for the relation.  bufmgr will just
> >> +   * drop them without bothering to write the contents.
> >> +   */
> >> +  DropRelFileNodeAllBuffers(rnodes, nrels);
> >
> > I think it might be a good idea to handle temp relations on/buffers on
> > this level instead of trying to put it into
> > DropRelFileNodeAllBuffers. Would also allow you to only build an array
> > of RelFileNode's instead of RelFileNodeBackend's which might make it
> > slightl faster.
>
> Hmmm, sadly that'd require duplication of code because it needs to be
> done in smgrdounlink too.

It should be fairly small though.

int nrels_nonlocal = 0;

for (i = 0; i < nrels; i++)
{
        RelFileNodeBackend rnode = rels[i]->smgr_rnode;
        int                     which = rels[i]->smgr_which;

        if (RelFileNodeBackendIsTemp(rnode))
           DropRelFileNodeAllLocalBuffers
        else
           rnodes[nrels_nonlocal++];

        for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
            (*(smgrsw[which].smgr_close)) (rels[i], forknum);
}

DropRelFileNodeAllLocalBuffers(rnode, nrels_nonlocal);

In smgrdounlink it should only be a

if (RelFileNodeBackendIsTemp(rnode))
   DropRelFileNodeAllLocalBuffers(rnode)
else
   DropRelFileNodeAllBuffers(rnode);

ISTM that would be cleaner then memmove'ing the rnode array to remove
the temp rels away.

> >
> >> +  /*
> >> +   * It'd be nice to tell the stats collector to forget it immediately, 
> >> too.
> >> +   * But we can't because we don't know the OID (and in cases involving
> >> +   * relfilenode swaps, it's not always clear which table OID to forget,
> >> +   * anyway).
> >> +   */
> >
> > This and at least one other comment seems to be in too many locations
> > now.
>
> I see it in three places - smgrdounlink, smgrdounlinkall and
> smgrdounlinkfork. Which ones you consider superfluous? I think it's
> appropriate to keep them in all three places.

I only read the patch, not the result, so maybe youre right. I'll look
at it sometime.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

Reply via email to