On 6.12.2012 05:47, Shigeru Hanada wrote:
> On Mon, Nov 12, 2012 at 4:36 AM, Tomas Vondra <t...@fuzzy.cz> wrote:
>> Hi,
>>
>> I've prepared a slightly updated patch, based on the previous review.
>> See it attached.
> 
> All changes in v3 patch seem good, however I found some places which requires
> cosmetic changes.  Please see attached v3.1 patch for those changes.

OK, thanks!

>>> Oops, you're right. I omitted 1-3 and 1-4 in actual measurement, but
>>> IMO loading data is necessary to fill the shared buffer up, because
>>> # of buffers which are deleted during COMMIT is major factor of this
>>> patch. And, yes, I verified that all shared buffers are used after
>>> all loading have been finished.
>>
>> I don't think it's an important factor, as the original code does this:
>>
>>   for (i = 0; i < NBuffers; i++)
>>   {
>>     volatile BufferDesc *bufHdr = &BufferDescriptors[i];
>>     ...
>>   }
>>
>> in the DropRelFileNodeAllBuffers. That loops through all shared buffers
>> no matter what, so IMHO the performance in this case depends on the
>> total size of the shared buffers. But maybe I'm missing something important.
> 
> I worry the effect of "continue" in the loop to the performance.  If most of
> shared buffers are not used at the moment of DROP, the load of DROP doesn't
> contain the overhead of LockBufHdr and subsequent few lines.
> Do you expect such situation in your use cases?

I still fail to see the issue here - can you give an example of when
this would be a problem?

The code was like this before the patch, I only replaced the simple
comparison to a binary search ... Or do you think that this check means
"if the buffer is empty"?

    /* buffer does not belong to any of the relations */
    if (rnode == NULL)
        continue;

Because it does not - notice that the 'rnode' is a result of the binary
search, not a direct check of the buffer.

So the following few lines (lock and recheck) will be skipped either for
unused buffers and buffers belonging to relations that are not being
dropped.

Maybe we could do a simple 'is the buffer empty' check before the
bsearch call, but I don't expect that to happen very often in real world
(the cache tends to be used).

>> I've done a simple benchmark on my laptop with 2GB shared buffers, it's
>> attached in the drop-test.py (it's a bit messy, but it works).
> [snip]
>> With those parameters, I got these numbers on the laptop:
>>
>>   creating 10000 tables
>>     all tables created in 3.298694 seconds
>>   dropping 10000 tables one by one ...
>>     all tables dropped in 32.692478 seconds
>>   creating 10000 tables
>>     all tables created in 3.458178 seconds
>>   dropping 10000 tables in batches by 100...
>>     all tables dropped in 3.28268 seconds
>>
>> So it's 33 seconds vs. 3.3 seconds, i.e. 10x speedup. On AWS we usually
>> get larger differences, as we use larger shared buffers and the memory
>> is significantly slower there.
> 
> Do you have performance numbers of same test on not-patched PG?
> This patch aims to improve performance of bulk DROP, so 4th number in
> the result above should be compared to 4th number of not-patched PG?

I've re-run the tests with the current patch on my home workstation, and
the results are these (again 10k tables, dropped either one-by-one or in
batches of 100).

1) unpatched

dropping one-by-one:        15.5 seconds
dropping in batches of 100: 12.3 sec

2) patched (v3.1)

dropping one-by-one:        32.8 seconds
dropping in batches of 100:  3.0 sec

The problem here is that when dropping the tables one-by-one, the
bsearch overhead is tremendous and significantly increases the runtime.
I've done a simple check (if dropping a single table, use the original
simple comparison) and I got this:

3) patched (v3.2)

dropping one-by-one:        16.0 seconds
dropping in batches of 100:  3.3 sec

i.e. the best of both. But it seems like an unnecessary complexity to me
- if you need to drop a lot of tables you'll probably do that in a
transaction anyway.

Tomas
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 2446282..9e92230 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -312,6 +312,10 @@ smgrDoPendingDeletes(bool isCommit)
        PendingRelDelete *pending;
        PendingRelDelete *prev;
        PendingRelDelete *next;
+       SMgrRelation *srels = (SMgrRelation *) palloc(sizeof(SMgrRelation));
+       int                     nrels = 0,
+                               i = 0,
+                               maxrels = 1;
 
        prev = NULL;
        for (pending = pendingDeletes; pending != NULL; pending = next)
@@ -335,14 +339,33 @@ smgrDoPendingDeletes(bool isCommit)
                                SMgrRelation srel;
 
                                srel = smgropen(pending->relnode, 
pending->backend);
-                               smgrdounlink(srel, false);
-                               smgrclose(srel);
+                               
+                               /* extend the array if needed (double the size) 
*/
+                               if (maxrels <= nrels)
+                               {
+                                       maxrels *= 2;
+                                       srels = (SMgrRelation *)
+                                               repalloc(srels, 
sizeof(SMgrRelation) * maxrels);
+                               }
+                               
+                               srels[nrels++] = srel;
                        }
                        /* must explicitly free the list entry */
                        pfree(pending);
                        /* prev does not change */
                }
        }
+
+       if (nrels > 0)
+       {
+               smgrdounlinkall(srels, nrels, false);
+               
+               for (i = 0; i < nrels; i++)
+                       smgrclose(srels[i]);
+       }
+       
+       pfree(srels);
+
 }
 
 /*
diff --git a/src/backend/storage/buffer/bufmgr.c 
b/src/backend/storage/buffer/bufmgr.c
index dddb6c0..716d795 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -107,7 +107,7 @@ static volatile BufferDesc *BufferAlloc(SMgrRelation smgr,
                        bool *foundPtr);
 static void FlushBuffer(volatile BufferDesc *buf, SMgrRelation reln);
 static void AtProcExit_Buffers(int code, Datum arg);
-
+static int rnode_comparator(const void * p1, const void * p2);
 
 /*
  * PrefetchBuffer -- initiate asynchronous read of a block of a relation
@@ -2094,31 +2094,52 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, 
ForkNumber forkNum,
  * --------------------------------------------------------------------
  */
 void
-DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
+DropRelFileNodeAllBuffers(RelFileNodeBackend *rnodes, int nnodes)
 {
-       int                     i;
+       int         i;
+
+       /* sort the list of rnodes */
+       pg_qsort(rnodes, nnodes, sizeof(RelFileNodeBackend), rnode_comparator);
 
        /* If it's a local relation, it's localbuf.c's problem. */
-       if (RelFileNodeBackendIsTemp(rnode))
+       for (i = 0; i < nnodes; i++)
        {
-               if (rnode.backend == MyBackendId)
-                       DropRelFileNodeAllLocalBuffers(rnode.node);
-               return;
+               if (RelFileNodeBackendIsTemp(rnodes[i]))
+               {
+                       if (rnodes[i].backend == MyBackendId)
+                               DropRelFileNodeAllLocalBuffers(rnodes[i].node);
+               }
        }
 
        for (i = 0; i < NBuffers; i++)
        {
+               
+               RelFileNodeBackend *rnode = NULL;
                volatile BufferDesc *bufHdr = &BufferDescriptors[i];
-
-               /*
-                * As in DropRelFileNodeBuffers, an unlocked precheck should be 
safe
-                * and saves some cycles.
-                */
-               if (!RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
-                       continue;
+               
+               /* don't use the bsearch for a single-table drop */
+               if (nnodes == 1)
+               {
+                       if (!RelFileNodeEquals(bufHdr->tag.rnode, rnodes->node))
+                               continue;
+                       
+                       rnode = rnodes;
+                       
+               } else {
+                       
+                       rnode = bsearch((const void *) &(bufHdr->tag.rnode),
+                                                                               
        rnodes,
+                                                                               
        nnodes, sizeof(RelFileNodeBackend),
+                                                                               
        rnode_comparator);
+                       
+                       /* buffer does not belong to any of the relations */
+                       if (rnode == NULL)
+                               continue;
+                       
+               }
 
                LockBufHdr(bufHdr);
-               if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node))
+               if (RelFileNodeEquals(bufHdr->tag.rnode, rnode->node))
                        InvalidateBuffer(bufHdr);       /* releases spinlock */
                else
                        UnlockBufHdr(bufHdr);
@@ -2953,3 +2974,31 @@ local_buffer_write_error_callback(void *arg)
                pfree(path);
        }
 }
+
+/*
+ * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
+ * that it's suitable for bsearch.
+ */
+static int
+rnode_comparator(const void * p1, const void * p2)
+{
+       RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
+       RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
+
+       if (n1.node.relNode < n2.node.relNode)
+               return -1;
+       else if (n1.node.relNode > n2.node.relNode)
+               return 1;
+
+       if (n1.node.dbNode < n2.node.dbNode)
+               return -1;
+       else if (n1.node.dbNode > n2.node.dbNode)
+               return 1;
+
+       if (n1.node.spcNode < n2.node.spcNode)
+               return -1;
+       else if (n1.node.spcNode > n2.node.spcNode)
+               return 1;
+       else
+               return 0;
+}
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 5dff8b3..40a9d0b 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -390,7 +390,7 @@ smgrdounlink(SMgrRelation reln, bool isRedo)
         * Get rid of any remaining buffers for the relation.  bufmgr will just
         * drop them without bothering to write the contents.
         */
-       DropRelFileNodeAllBuffers(rnode);
+       DropRelFileNodeAllBuffers(&rnode, 1);
 
        /*
         * It'd be nice to tell the stats collector to forget it immediately, 
too.
@@ -419,6 +419,64 @@ smgrdounlink(SMgrRelation reln, bool isRedo)
        (*(smgrsw[which].smgr_unlink)) (rnode, InvalidForkNumber, isRedo);
 }
 
+void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo)
+{
+       int i = 0;
+       RelFileNodeBackend rnodes[nrels];
+       ForkNumber  forknum;
+
+       for (i = 0; i < nrels; i++)
+       {
+               RelFileNodeBackend rnode = rels[i]->smgr_rnode;
+               int         which = rels[i]->smgr_which;
+
+               rnodes[i] = rnode;
+
+               /* Close the forks at smgr level */
+               for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
+                       (*(smgrsw[which].smgr_close)) (rels[i], forknum);
+       }
+
+       /*
+        * Get rid of any remaining buffers for the relation.  bufmgr will just
+        * drop them without bothering to write the contents.
+        */
+       DropRelFileNodeAllBuffers(rnodes, nrels);
+
+       /*
+        * It'd be nice to tell the stats collector to forget it immediately, 
too.
+        * But we can't because we don't know the OID (and in cases involving
+        * relfilenode swaps, it's not always clear which table OID to forget,
+        * anyway).
+        */
+
+       /*
+        * Send a shared-inval message to force other backends to close any
+        * dangling smgr references they may have for this rel.  We should do 
this
+        * before starting the actual unlinking, in case we fail partway through
+        * that step.  Note that the sinval message will eventually come back to
+        * this backend, too, and thereby provide a backstop that we closed our
+        * own smgr rel.
+        */
+       for (i = 0; i < nrels; i++) {
+               CacheInvalidateSmgr(rnodes[i]);
+       }
+
+       /*
+        * Delete the physical file(s).
+        *
+        * Note: smgr_unlink must treat deletion failure as a WARNING, not an
+        * ERROR, because we've already decided to commit or abort the current
+        * xact.
+        */
+
+       for (i = 0; i < nrels; i++) {
+               int         which = rels[i]->smgr_which;
+               for (forknum = 0; forknum <= MAX_FORKNUM; forknum++)
+                       (*(smgrsw[which].smgr_unlink)) (rnodes[i], forknum, 
isRedo);
+       }
+}
+
 /*
  *     smgrdounlinkfork() -- Immediately unlink one fork of a relation.
  *
diff --git a/src/include/storage/bufmgr.h b/src/include/storage/bufmgr.h
index 51eb77b..1deee42 100644
--- a/src/include/storage/bufmgr.h
+++ b/src/include/storage/bufmgr.h
@@ -188,7 +188,7 @@ extern void FlushRelationBuffers(Relation rel);
 extern void FlushDatabaseBuffers(Oid dbid);
 extern void DropRelFileNodeBuffers(RelFileNodeBackend rnode,
                                           ForkNumber forkNum, BlockNumber 
firstDelBlock);
-extern void DropRelFileNodeAllBuffers(RelFileNodeBackend rnode);
+extern void DropRelFileNodeAllBuffers(RelFileNodeBackend * rnodes, int nnodes);
 extern void DropDatabaseBuffers(Oid dbid);
 
 #define RelationGetNumberOfBlocks(reln) \
diff --git a/src/include/storage/smgr.h b/src/include/storage/smgr.h
index 9eccf76..272476b 100644
--- a/src/include/storage/smgr.h
+++ b/src/include/storage/smgr.h
@@ -85,6 +85,7 @@ extern void smgrcloseall(void);
 extern void smgrclosenode(RelFileNodeBackend rnode);
 extern void smgrcreate(SMgrRelation reln, ForkNumber forknum, bool isRedo);
 extern void smgrdounlink(SMgrRelation reln, bool isRedo);
+extern void smgrdounlinkall(SMgrRelation * rels, int nrels, bool isRedo);
 extern void smgrdounlinkfork(SMgrRelation reln, ForkNumber forknum, bool 
isRedo);
 extern void smgrextend(SMgrRelation reln, ForkNumber forknum,
                   BlockNumber blocknum, char *buffer, bool skipFsync);
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to