Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-30 Thread Heikki Linnakangas

Tom Lane wrote:

Updated version of Heikki's buffer ring patch, as per my comments here:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00449.php

The COPY IN part of the patch is not there, pending resolution of
whether we think it adds enough value to be worth uglifying
heap_insert's API for.


I ran a series of tests, and it looks like it's not worth it.

The test case I used was DBT-2, with a big COPY running in the 
background. That's the same method I used for SELECTs, just replaced the 
SELECT COUNT(*) with a COPY FROM. The table I copied to was truncated 
between COPYs, and had no indexes.


The results are inconclusive, because the results seem to be quite 
inconsistent. With 100 warehouses, and no patch, I'm getting average 
new-order response times between 1-3 seconds over 5 test runs. The 
results with the patch are in the same range. Runs with 90 and 120 
warehouses also varied greatly.


With the SELECTs, the patch made the big selects to finish quicker, in 
addition to slightly reducing the impact on other queries. For COPY, 
that benefit was not there either, and again there was a lot more 
variance in how long time the COPYs took.


If there's a benefit for COPY from this patch, it's not clear enough to 
spend effort on. The main problem with COPY seems to be that it causes a 
very unpredictable impact on other queries. I can post the results if 
someone wants to look at them, but I couldn't see any clear pattern in them.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-30 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 The COPY IN part of the patch is not there, pending resolution of
 whether we think it adds enough value to be worth uglifying
 heap_insert's API for.

 I ran a series of tests, and it looks like it's not worth it.

Great, I'll pull out the last vestiges of that and apply.

If we did want to pursue this, I was thinking of inventing a
BulkInsertTarget object type that could be passed to heap_insert,
in the same spirit as BufferAccessStrategy in my WIP patch.
This would carry a BufferAccessStrategy and also serve to track a
currently-pinned target page as per your thought about avoiding
pin/unpin cycles across multiple inserts.  I think we could fold
use_fsm into it as well (maybe use_wal too), and thereby avoid growing
heap_insert's parameter list still more.

Not something I want to pursue now, but just getting these thoughts
into the archives in case someone picks it up again.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Updated version of Heikki's buffer ring patch, as per my comments here:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00449.php

The COPY IN part of the patch is not there, pending resolution of
whether we think it adds enough value to be worth uglifying
heap_insert's API for.  Also, I tentatively reduced the threshold
at which heapscans switch to ring mode to NBuffers/16; that probably
needs more thought.  Lastly, I haven't done anything about making
non-btree indexes honor the access strategy during VACUUM scans.

regards, tom lane



binlsHkz85l0G.bin
Description: buffer-ring-2.patch.gz

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Heikki Linnakangas

Tom Lane wrote:

 Also, I tentatively reduced the threshold
at which heapscans switch to ring mode to NBuffers/16; that probably
needs more thought.  


Yeah. One scenario where threshold  shared_buffers will hurt is if your 
shared_buffers = RAM size / 2. In that scenario, a scan on a table that 
would barely fit in shared_buffers, will use the ring instead and not 
fit in the OS cache either. Which means that repeatedly scanning that 
table will do physical I/O with the patch, but not without it.


swappiness, using linux terms, also makes a difference. When I started 
testing the patch, I saw unexpectedly high gains from the patch with the 
following configuration:

- RAM size 4 GB
- shared_buffers 1 GB
- table size 3GB

Without the patch, the table wouldn't fit in shared_buffers, and also 
wouldn't fit in the OS cache, so repeatedly scanning the table always 
read the table physically from disk, and it took ~20 seconds. With the 
patch, however, the ring only actively used a few pages from 
shared_buffers, and the kernel swapped out the rest. Thanks to that, 
there was more than 3GB of RAM available for OS caching, the table fit 
completely in the OS cache, and the query took  2 seconds. It took me 
quite a while to figure out what's going on.



Lastly, I haven't done anything about making
non-btree indexes honor the access strategy during VACUUM scans.


Also there's no attempt to not inflate usage_count, which means that 
synchronized scans will spoil the buffer cache as if we didn't have the 
buffer ring patch. If there's no easy solution, I think we could live 
with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
instead of UnpinBuffer sounds like a nice solution to me.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Also there's no attempt to not inflate usage_count, which means that 
 synchronized scans will spoil the buffer cache as if we didn't have the 
 buffer ring patch.

As I said, these patches are hardly independent.

 If there's no easy solution, I think we could live 
 with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
 instead of UnpinBuffer sounds like a nice solution to me.

After thinking about it more, I'm a bit hesitant to do that because it
will change the interaction with the clock sweep for buffers that stay
pinned for awhile.  I had suggested making the clock sweep not decrement
usage_count of a pinned buffer, but I think that would change the
fairness of the algorithm.  OTOH it may not matter that much if we just
move the usage_count increment and leave the clock sweep alone.  Do we
have any decent way of measuring the effectiveness of the clock-sweep
allocation algorithm?

I also thought about having ReadBuffer decrement the usage count when it
has a nondefault strategy and finds the buffer already in cache; this
would then cancel out the later unconditional increment in UnpinBuffer.
But that makes twice as many cycles spent holding the buffer spinlock.

Either one of these methods would require PinBuffer to be aware of the
strategy argument, which it is not at present.  OTOH with the first way
we could get rid of the normalAccess argument to UnpinBuffer, so
there's some net conservation of cruft I guess.  I think I had
originally given this task to UnpinBuffer on the theory that we'd have
better information at unpin time than pin time about what the buffer
state had been and thus be able to make smarter decisions about whether
to bump the access count or not.  But at the moment it doesn't seem that
we really need any such info; AFAICS all the callers of PinBuffer know
what they want to happen.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
I wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 If there's no easy solution, I think we could live 
 with that, but Greg's suggestion of bumping the usage_count in PinBuffer 
 instead of UnpinBuffer sounds like a nice solution to me.

 After thinking about it more, I'm a bit hesitant to do that because it
 will change the interaction with the clock sweep for buffers that stay
 pinned for awhile.  I had suggested making the clock sweep not decrement
 usage_count of a pinned buffer, but I think that would change the
 fairness of the algorithm.  OTOH it may not matter that much if we just
 move the usage_count increment and leave the clock sweep alone.  Do we
 have any decent way of measuring the effectiveness of the clock-sweep
 allocation algorithm?

Despite above misgivings, here's a version of the patch that moves
usage_count incrementing to PinBuffer instead of UnpinBuffer.  It does
seem a good bit cleaner.

regards, tom lane



binalDuLkt1Ft.bin
Description: buffer-ring-3.patch.gz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Greg Smith

On Tue, 29 May 2007, Tom Lane wrote:

Do we have any decent way of measuring the effectiveness of the 
clock-sweep allocation algorithm?


I put a view on top of the current pg_buffercache (now that it include 
usage_count) that shows what the high usage_count buffers consist of. 
Since they were basically what I hoped for (like plenty of index blocks on 
popular tables) that seemed a reasonable enough measure of effectiveness 
for my purposes.  I briefly looked into adding some internal measurements 
in this area, like how many buffers are scanned on average to satisfy an 
allocation request; that would actually be easy to add to the buffer 
allocation stats part of the auto bgwriter_max_pages patch I submitted 
recently.


Based on my observations of buffer cache statistics, the number of pinned 
buffers at any time is small enough that in a reasonably sized buffer 
cache, I wouldn't expect a change in the pinned usage_count behavior to 
have any serious impact.  With what you're adjusting, the only time I can 
think of that there would be a noticable shift in fairness would be if 
ones buffer cache was very small relative to the number of clients, which 
is kind of an unreasonable situation to go out of your way to accommodate.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] WIP: 2nd-generation buffer ring patch

2007-05-29 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 Based on my observations of buffer cache statistics, the number of pinned 
 buffers at any time is small enough that in a reasonably sized buffer 
 cache, I wouldn't expect a change in the pinned usage_count behavior to 
 have any serious impact.

Fair enough.  The patch I put up earlier tonight bumps usage_count at
PinBuffer instead of UnpinBuffer time, and leaves the clock sweep
behavior unchanged, which means that a buffer that had stayed pinned for
more than a clock-sweep cycle time could get recycled almost instantly
after being unpinned.  That seems intuitively bad.  If we make the clock
sweep code not decrement usage_count of a pinned buffer then the problem
goes away.  I had expressed some discomfort with that idea, but I've got
to admit that it's only a vague worry not anything concrete.  Barring
objections I'll adjust the patch to include the clock-sweep change.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend