Re: [HACKERS] Buffering GiST leaf pages too

2011-08-25 Thread Jim Nasby
On Aug 23, 2011, at 2:03 AM, Heikki Linnakangas wrote:
 While looking at Alexander's GiST fastbuild patch, which adds buffers to 
 internal nodes to avoid random I/O during index build, it occurred to me that 
 inserting the tuples to the leaf pages one at a time is quite inefficient 
 too, even if the leaf pages are in cache. There's still the overhead of 
 locking and WAL-logging each insertion separately. I think we could get a 
 nice further speedup if we attach a small buffer (one block or so) to every 
 leaf page we're currently writing tuples to, and update the leaf page in 
 bulk. Conveniently, the code to insert multiple tuples to a page already 
 exists in GiST code (because inserting a tuple sometimes splits the page into 
 more than two parts, so you need to insert multiple downlinks to the parent), 
 so this requires no changes to the low-level routines and WAL-logging.
 
 Let's finish off the main fastbuild patch first, but I wanted to get the idea 
 out there.

I've often wondered about the per-tuple overhead of all kinds of operations, 
not just GiST index builds. For example, if you're doing a seqscan, ISTM it 
would be a lot more efficient to memcpy an entire page into backend-local 
memory and operate off of that lock-free. Similarly for an index scan, you'd 
want to copy a full leaf page if you think you'll be hitting it more than once 
or twice.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Buffering GiST leaf pages too

2011-08-25 Thread Heikki Linnakangas

On 26.08.2011 00:45, Jim Nasby wrote:

I've often wondered about the per-tuple overhead of all kinds of operations, 
not just GiST index builds. For example, if you're doing a seqscan, ISTM it 
would be a lot more efficient to memcpy an entire page into backend-local 
memory and operate off of that lock-free.


What we currently do is even better than that. We take the lock once, 
and hold it while we do all the visibility checks. Then the lock is 
released, but the page is kept pinned so that it doesn't get evicted 
from the buffer cache. No memcpy() required.



Similarly for an index scan, you'd want to copy a full leaf page if you think 
you'll be hitting it more than once or twice.


We more or less do that too already. When an index scan steps on a leaf 
page, it scans the page for all matches, and copies them to 
backend-local memory. The page lock is then released.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Buffering GiST leaf pages too

2011-08-23 Thread Heikki Linnakangas
While looking at Alexander's GiST fastbuild patch, which adds buffers to 
internal nodes to avoid random I/O during index build, it occurred to me 
that inserting the tuples to the leaf pages one at a time is quite 
inefficient too, even if the leaf pages are in cache. There's still the 
overhead of locking and WAL-logging each insertion separately. I think 
we could get a nice further speedup if we attach a small buffer (one 
block or so) to every leaf page we're currently writing tuples to, and 
update the leaf page in bulk. Conveniently, the code to insert multiple 
tuples to a page already exists in GiST code (because inserting a tuple 
sometimes splits the page into more than two parts, so you need to 
insert multiple downlinks to the parent), so this requires no changes to 
the low-level routines and WAL-logging.


Let's finish off the main fastbuild patch first, but I wanted to get the 
idea out there.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers