Hello,

I recently got a trouble on development of my extension that utilizes
the shared buffer when it released each buffer page.

This extension transfers contents of the shared buffers to GPU device
using DMA feature, then kicks a device kernel code.
Usually 8KB (= BLCKSZ) is too small as a unit size for calculation,
so this extension pins multiple pages prior to DMA transfer, then it
releases after the device kernel execution.
For the performance reason, 16MB-64MB is a preferable data size per
a device kernel execution. DMA transfer of 16MB-64MB needs 2048-8192
pages being pinned simultaneously.

Once backend/extension calls ReadBuffer(), resowner.c tracks which
buffer was referenced by the current resource owner, to ensure these
buffers being released at end of the transaction.
However, it seems to me implementation of resowner.c didn't assume
many buffers are referenced by a particular resource owner simultaneously.
It manages the buffer index using an expandable array, then looks up
the target buffer by sequential walk but from the tail because recently
pinned buffer tends to be released first.
It made a trouble in my case. My extension pinned multiple thousands
buffers, so owner->buffers[] were enlarged and takes expensive cost
to walk on.
In my measurement, ResourceOwnerForgetBuffer() takes 36 seconds in
total during hash-joining 2M rows; even though hash-joining itself
takes less than 16 seconds.

What is the best way to solve the problem?

Idea-1) Put ResourceOwnerForgetBuffer() O(1) logic, instead of O(N^2).
The source of problem come from data structure in ResourceOwnerData,
so a straightforward way is to apply O(1) logic based on hashing,
instead of the linear search.
An issue is how beneficial or harmless to the core code, not only
my extension. Probably, it "potentially" beneficial to the core
backend also. However, its effect is not easy to observe right now
because usual workload takes enough small amount of buffers at the
same time.

The attached patch applies O(1) logic on ResourceOwnerForgetBuffer().
It makes time consumption 36sec->0.09sec during 20M rows joinning
based on hash-logic. 

Idea-2) track shared buffer being referenced by extension itself
One other, but not preferable, option is to call ResourceOwnerForgetBuffer()
just after ReadBuffer() on the extension side.
Once resource-owner forget it, extension shall be responsible to
release the buffer at end of the transaction, even if it aborted.
It also makes us unavailable to use ReleaseBuffer(), so extension
has to have duplication of ReleaseBuffer() but no ResourceOwnerForgetBuffer().
This idea has few advantage towards the idea-1, but only advantage
is to avoid changes to the core PostgreSQL.

Any comments?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>

Attachment: pgsql-v9.5-resowner-forget-buffer-o1.v1.patch
Description: pgsql-v9.5-resowner-forget-buffer-o1.v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to