On 14.07.2011 11:33, Alexander Korotkov wrote:
On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:

One thing that caught my eye is that when you empty a buffer, you load the
entire subtree below that buffer, down to the next buffered or leaf level,
into memory. Every page in that subtree is kept pinned. That is a problem;
in the general case, the buffer manager can only hold a modest number of
pages pinned at a time. Consider that the minimum value for shared_buffers
is just 16. That's unrealistically low for any real system, but the default
is only 32MB, which equals to just 4096 buffers. A subtree could easily be
larger than that.

With level step = 1 we need only 2 levels in subtree. With mininun index
tuple size (12 bytes) each page can have at maximum 675. Thus I think
default shared_buffers is enough for level step = 1.

Hundreds of buffer pins is still a lot. And with_level_step=2, the number of pins required explodes to 675^2 = 455625.

Pinning a buffer that's already in the shared buffer cache is cheap, I doubt you're gaining much by keeping the private hash table in front of the buffer cache. Also, it's possible that not all of the subtree is actually required during the emptying, so in the worst case pre-loading them is counter-productive.

I believe it's enough
to add check we have sufficient shared_buffers, isn't it?

Well, what do you do if you deem that shared_buffers is too small? Fall back to the old method? Also, shared_buffers is shared by all backends, so you can't assume that you get to use all of it for the index build. You'd need a wide safety margin.

I don't think you're benefiting at all from the buffering that BufFile does
for you, since you're reading/writing a full block at a time anyway. You
might as well use the file API in fd.c directly, ie.
OpenTemporaryFile/FileRead/**FileWrite.

BufFile is distributing temporary data through several files. AFAICS
postgres avoids working with files larger than 1GB. Size of tree buffers can
easily be greater. Without BufFile I need to maintain set of files manually.

Ah, I see. Makes sense.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to