subject:"Re\: \[HACKERS\] Turning off HOT\/Cleanup sometimes"

On Tue, Apr 21, 2015 at 04:36:53PM -0400, Robert Haas wrote:
  Keep in mind there's a disconnect between dirtying a page and writing it
  to storage.  A page could remain dirty for a long time in the buffer
  cache.  This writing of sequential pages would occur at checkpoint time
  only, which seems the wrong thing to optimize.  If some other process
  needs to evict pages to make room to read some other page in, surely
  it's going to try one page at a time, not write many sequential dirty
  pages.
 
 Well, for a big sequential scan, we use a ring buffer, so we will
 typically be evicting the pages that we ourselves read in moments
 before.  So in this case we would do a lot of sequential writes of
 dirty pages.

Ah, yes, this again supports the prune-then-skip approach, rather than
doing the first X% pruneable pages seen.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Jim Nasby


On 4/21/15 4:07 PM, Peter Eisentraut wrote:

On 4/21/15 4:45 PM, Jim Nasby wrote:
In order for a background worker to keep up with some of the workloads
that have been presented as counterexamples, you'd need multiple
background workers operating in parallel and preferring to work on
certain parts of a table.  That would require a lot more sophisticated
job management than we currently have for, say, autovacuum.


My thought was that the foreground queries would send page IDs to the 
bgworker via a shmq. If the queries have to do much waiting at all on IO 
then I'd expect the bgworker to be able to keep pace with a bunch of 
them since it's just grabbing buffers that are already in the pool (and 
only those in the pool; it wouldn't make sense for it to pull it back 
from the kernel, let alone disk).


We'd need to code this so that if a queue fills up the query doesn't 
block; we just skip that opportunity to prune. I think that'd be fine.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Greg Stark

On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian br...@momjian.us wrote:

 But if the entire table is very hot, I think that that is just another of way
 of saying that autovacuum is horribly misconfigured.  I think the purpose of

 Well, we have to assume there are many misconfigured configurations ---
 autovacuum isn't super-easy to configure, so we can't just blame the
 user if this makes things worse.  In fact, page pruning was designed
 spefically for cases where autovacuum wasn't running our couldn't keep
 up.

Well autovacuum isn't currently considering HOT pruning part of its
job at all. It's hard to call it misconfigured when there's
literally *no* way to configure it correctly.

If you update less than autovacuum_vacuum_scale_factor fraction of the
table and then never update another row autovacuum will never run.
Ever. Every select will forevermore need to follow hot chains on that
table. Until eventually transaction wraparound forces a vacuum on that
table if that ever happens.

Possibly autovacuum could be adjusted to count how many selects are
happening on the table and decide to vacuum it when the cost of the
selects following the dead tuples is balanced by the cost of doing a
vacuum. But that's not something included in the design of autovacuum
today.

The original design of tuple storage was aimed at optimizing the
steady state where most tuples were not recently updated. It
guaranteed that except for tuples that were in the process of being
updated or were recently updated a tuple read didn't have to read the
CLOG, didn't have to follow any chains, didn't have to do any I/O or
other work other than to read the bits on the tuple itself. When a
tuple is updated it's put into a state where everyone who comes along
has to do extra work but as soon as practical the hint bits get set
and that extra work stops.

We had similar discussions about setting hint bits in the past. I'm
not sure why HOT pruning is the focus now because I actually think
hint bit setting is a larger source of I/O in innocent looking selects
even today. And it's a major headache, people are always being
surprised that their selects cause lots of I/O and slow down
dramatically after a big update or data load has finished. It's
characterized as why is the database writing everything twice (and
saying it's actually writing everything three times doesn't make
people feel better). In the new age of checksums with hint bit logging
I wonder if it's even a bigger issue.

It occurs to me that generating these dirty pages isn't really that
expensive individually. It's only that there's a sudden influx of a
large number of dirty pages that causes them to get translated
immediately into filesystem I/O. Perhaps we should dirty pages on hint
bit updates and do HOT pruning only to the extent it can be done
without causing I/O. Of course it's hard to tell that in advance  but
maybe something like if the current buffer had to be fetched and
caused a dirty buffer to be evicted then skip hot pruning and don't
dirty it for any hint bit updates would at least mean that once the
select fills up its share of buffers with dirty buffers it stops
dirtying more. It would dirty pages only as fast as bgwriter or
checkpoints manage to write them out.

That sounds a bit weird but I think the right solution should have
that combination of properties. It should guarantee that hint bits get
set and hot chains pruned within some length of time but that no one
select causes a storm of dirty buffers that then need to be flushed to
disk.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Peter Eisentraut

On 4/22/15 11:37 AM, Jim Nasby wrote:
 On 4/21/15 4:07 PM, Peter Eisentraut wrote:
 On 4/21/15 4:45 PM, Jim Nasby wrote:
 In order for a background worker to keep up with some of the workloads
 that have been presented as counterexamples, you'd need multiple
 background workers operating in parallel and preferring to work on
 certain parts of a table.  That would require a lot more sophisticated
 job management than we currently have for, say, autovacuum.
 
 My thought was that the foreground queries would send page IDs to the
 bgworker via a shmq. If the queries have to do much waiting at all on IO
 then I'd expect the bgworker to be able to keep pace with a bunch of
 them since it's just grabbing buffers that are already in the pool (and
 only those in the pool; it wouldn't make sense for it to pull it back
 from the kernel, let alone disk).
 
 We'd need to code this so that if a queue fills up the query doesn't
 block; we just skip that opportunity to prune. I think that'd be fine.

I think a to-clean-up map would work better.  But basically we need a
way to remember where to clean up later if we're not going to do it in
the foreground.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Tue, Apr 21, 2015 at 05:07:52PM -0400, Peter Eisentraut wrote:
 On 4/21/15 4:45 PM, Jim Nasby wrote:
  This comment made me wonder... has anyone considered handing the pruning
  work off to a bgworker, at least for SELECTs? That means the selects
  themselves wouldn't be burdened by the actual prune work, only in
  notifying the bgworker. While that's not going to be free, presumably
  it's a lot cheaper...
 
 The nice thing about having foreground queries to the light cleanup is
 that they can work in parallel and naturally hit the interesting parts
 of the table first.
 
 In order for a background worker to keep up with some of the workloads
 that have been presented as counterexamples, you'd need multiple
 background workers operating in parallel and preferring to work on
 certain parts of a table.  That would require a lot more sophisticated
 job management than we currently have for, say, autovacuum.

Well, the visibility map tells us where _not_ to clean up, so using
another map to tell use _where_ to cleanup might make sense.  However,
the density of the map might be low enough that a list makes more sense,
as you suggested.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Kevin Grittner

Greg Stark st...@mit.edu wrote:

 And it's a major headache, people are always being surprised that
 their selects cause lots of I/O and slow down dramatically after
 a big update or data load has finished. It's characterized as
 why is the database writing everything twice (and saying it's
 actually writing everything three times doesn't make people feel
 better).

When I looked at the life-cycle of a heap tuple in a database I was
using, I found that (ignoring related index access and ignoring
WAL-file copying, etc., for our backups), each tuple that existed
long enough to freeze and be eventually deleted caused a lot of
writes.

(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

If I was lucky some of the writes could be combined in cache
because they happened close enough together.  Also, one could hope
that not too much of the WAL-logging involved full page writes to
the WAL -- again, keeping steps close together in time helps with
that.  If all of (1) through (5) are done in quick succession, you
save two physical writes of the heap page and save one full page

write to WAL.  If steps (7) through (9) are done in quick
succession, you save two more physical writes to the heap.  This is
part of what makes the aggressive incremental freezing being
discussed on a nearby thread appealing -- at least for some
workloads.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Alvaro Herrera

Bruce Momjian wrote:
 On Wed, Apr 22, 2015 at 04:36:17PM +0100, Greg Stark wrote:
  On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian br...@momjian.us wrote:
   Well, we have to assume there are many misconfigured configurations ---
   autovacuum isn't super-easy to configure, so we can't just blame the
   user if this makes things worse.  In fact, page pruning was designed
   spefically for cases where autovacuum wasn't running our couldn't keep
   up.
  
  Well autovacuum isn't currently considering HOT pruning part of its
  job at all. It's hard to call it misconfigured when there's
  literally *no* way to configure it correctly.
 
 Good point, but doesn't vacuum remove the need for pruning as it removes
 all the old rows?

Sure.  The point, I think, is to make autovacuum runs of some sort that
don't actually vacuum but only do HOT-pruning.  Maybe this is a
reasonable solution to the problem that queries don't prune anymore
after Simon's patch.  If we made autovac HOT-prune periodically, we
could have read-only queries prune only already-dirty pages.  Of course,
that would need further adjustments to default number of autovac
workers, I/O allocation, etc.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Wed, Apr 22, 2015 at 06:07:00PM -0300, Alvaro Herrera wrote:
  Good point, but doesn't vacuum remove the need for pruning as it removes
  all the old rows?
 
 Sure.  The point, I think, is to make autovacuum runs of some sort that
 don't actually vacuum but only do HOT-pruning.  Maybe this is a
 reasonable solution to the problem that queries don't prune anymore
 after Simon's patch.  If we made autovac HOT-prune periodically, we
 could have read-only queries prune only already-dirty pages.  Of course,
 that would need further adjustments to default number of autovac
 workers, I/O allocation, etc.

Do we really want to make vacuum more complex for this?  vacuum does
have the delay settings we would need though.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-22 Thread Jim Nasby


On 4/22/15 1:51 PM, Kevin Grittner wrote:

(1) WAL log the insert.
(2) Write the tuple.
(3) Hint and rewrite the tuple.
(4) WAL log the freeze of the tuple.
(5) Rewrite the frozen tuple.
(6) WAL-log the delete.
(7) Rewrite the deleted tuple.
(8) Prune and rewrite the page.
(9) Free line pointers and rewrite the page.

If I was lucky some of the writes could be combined in cache
because they happened close enough together. Also, one could hope
that not too much of the WAL-logging involved full page writes to
the WAL -- again, keeping steps close together in time helps with
that.


This is why I like the idea of methods that tell us where we need to do 
cleanup... they provide us with a rough ability to track what tuples are 
in what part of their lifecycle. The VM helps with this a small amount, 
but really it only applies after 1 and 6; it doesn't help us with any 
other portions.


Having a way to track recently created tuples would allow us to be much 
more efficient with 1-3, and with aggressive freezing, 1-5. A way to 
track recently deleted tuples would help with 6-7, possibly 6-9 if no 
indexes.


If we doubled the size of the VM, that would let us track 4 states for 
each page:


- Page has newly inserted tuples
- Page has newly deleted tuples
- Page is all visible
- Page is frozen

though as discussed elsewhere, we could probably combine all visible and 
frozen.


The win from doing this would be easily knowing what pages need hinting 
(newly inserted) and pruning (newly deleted). Unfortunately we still 
wouldn't know whether we could do real work without visiting the page 
itself, but I suspect that for many workloads just having newly 
inserted/deleted would be a serious win.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Wed, Apr 22, 2015 at 04:36:17PM +0100, Greg Stark wrote:
 On Mon, Apr 20, 2015 at 8:48 PM, Bruce Momjian br...@momjian.us wrote:
  Well, we have to assume there are many misconfigured configurations ---
  autovacuum isn't super-easy to configure, so we can't just blame the
  user if this makes things worse.  In fact, page pruning was designed
  spefically for cases where autovacuum wasn't running our couldn't keep
  up.
 
 Well autovacuum isn't currently considering HOT pruning part of its
 job at all. It's hard to call it misconfigured when there's
 literally *no* way to configure it correctly.

Good point, but doesn't vacuum remove the need for pruning as it removes
all the old rows?

 If you update less than autovacuum_vacuum_scale_factor fraction of the
 table and then never update another row autovacuum will never run.
 Ever. Every select will forevermore need to follow hot chains on that
 table. Until eventually transaction wraparound forces a vacuum on that
 table if that ever happens.

Yes, that is a very good point, and it matches my concerns.  Of course,
Simon's concern is to avoid overly-aggressive pruning where the row is
being pruned but will soon be modified, making the prune, and its WAL
volume, undesirable.  We have to consider both cases in any final
solution.

 Possibly autovacuum could be adjusted to count how many selects are
 happening on the table and decide to vacuum it when the cost of the
 selects following the dead tuples is balanced by the cost of doing a
 vacuum. But that's not something included in the design of autovacuum
 today.

Well, autovacuum is also going to clean indexes, which seem like
overkill for pruning HOT updates.

 The original design of tuple storage was aimed at optimizing the
 steady state where most tuples were not recently updated. It
 guaranteed that except for tuples that were in the process of being
 updated or were recently updated a tuple read didn't have to read the
 CLOG, didn't have to follow any chains, didn't have to do any I/O or
 other work other than to read the bits on the tuple itself. When a
 tuple is updated it's put into a state where everyone who comes along
 has to do extra work but as soon as practical the hint bits get set
 and that extra work stops.

Yes, Simon is right that doing everything as-soon-as-possible is not
optimal.  I think the trick is knowing when we should give up waiting
for something else to dirty the page and prune it.

 We had similar discussions about setting hint bits in the past. I'm
 not sure why HOT pruning is the focus now because I actually think
 hint bit setting is a larger source of I/O in innocent looking selects
 even today. And it's a major headache, people are always being
 surprised that their selects cause lots of I/O and slow down
 dramatically after a big update or data load has finished. It's
 characterized as why is the database writing everything twice (and
 saying it's actually writing everything three times doesn't make
 people feel better). In the new age of checksums with hint bit logging
 I wonder if it's even a bigger issue.

What would be the downside of only doing pruning during SELECT hint bit
setting?  Hinting is delayed by long-running transactions, but so is
pruning.  I assume you can do more pruning than setting all_visible
hints because the old prunable rows are older by definition, but I am
unclear how much older they are.

FYI, while hint bit setting causes page writes, it does not cause WAL
writes unless you have wal_log_hints set or page-level checksums are
enabled.  By doing pruning at the same time as hint bit setting, you are
sharing the same page write, but are generating more WAL.  Of course, if
you are setting all-visible, then you are by definition waiting longer
to prune than before, and this might be enough to make it a win for all
use cases.  You wouldn't never-prune in a read-only workload because
your hint bits would eventually cause the pruning.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-21 Thread Bruce Momjian

On Mon, Apr 20, 2015 at 07:13:38PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
   Bruce Momjian wrote:
   
   This seems simple to implement: keep two counters, where the second one
   is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
   reset the first counter so that further 5 pages will get HOT pruned.  5%
   seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
   essentially +infinity.)
  
  This would tend to dirty non-sequential heap pages --- it seems best to
  just clean as many as we are supposed to, then skip the rest, so we can
  write sequential dirty pages to storage.
 
 Keep in mind there's a disconnect between dirtying a page and writing it
 to storage.  A page could remain dirty for a long time in the buffer
 cache.  This writing of sequential pages would occur at checkpoint time
 only, which seems the wrong thing to optimize.  If some other process
 needs to evict pages to make room to read some other page in, surely
 it's going to try one page at a time, not write many sequential dirty
 pages.

Yes, it might be too much optimization to try to get the checkpoint to
flush all those pages sequentially, but I was thinking of our current
behavior where, after an update of all rows, we effectively write out
the entire table because we have dirtied every page.  I guess with later
prune-based writes, we aren't really writing all the pages as we have
the pattern where pages with prunable content is kind of random. I guess
I was just wondering what value there is to your write-then-skip idea,
vs just writing the first X% of pages we find?  Your idea certainly
spreads out the pruning, and doesn't require knowing the size of the
table, though I though that information was easily determined.

One thing to consider is how we handle pruning of index scans that hit
multiple heap pages.  Do we still write X% of the pages in the table, or
%X of the heap pages we actually access via SELECT?  With the
write-then-skip approach, we would do X% of the pages we access, while
with the first-X% approach, we would probably prune all of them as we
would not be accessing most of the table.  I don't think we can do the
first first-X% of pages and have the percentage based on the number of
pages accessed as we have no way to know how many heap pages we will
access from the index.  (We would know for bitmap scans, but that
complexity doesn't seem worth it.)  That would argue, for consistency
with sequential and index-based heap access, that your approach is best.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-21 Thread Peter Eisentraut

On 4/21/15 4:45 PM, Jim Nasby wrote:
 This comment made me wonder... has anyone considered handing the pruning
 work off to a bgworker, at least for SELECTs? That means the selects
 themselves wouldn't be burdened by the actual prune work, only in
 notifying the bgworker. While that's not going to be free, presumably
 it's a lot cheaper...

The nice thing about having foreground queries to the light cleanup is
that they can work in parallel and naturally hit the interesting parts
of the table first.

In order for a background worker to keep up with some of the workloads
that have been presented as counterexamples, you'd need multiple
background workers operating in parallel and preferring to work on
certain parts of a table.  That would require a lot more sophisticated
job management than we currently have for, say, autovacuum.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-21 Thread Robert Haas

On Mon, Apr 20, 2015 at 6:13 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Bruce Momjian wrote:
 On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
  Bruce Momjian wrote:
  This seems simple to implement: keep two counters, where the second one
  is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
  reset the first counter so that further 5 pages will get HOT pruned.  5%
  seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
  essentially +infinity.)

 This would tend to dirty non-sequential heap pages --- it seems best to
 just clean as many as we are supposed to, then skip the rest, so we can
 write sequential dirty pages to storage.

 Keep in mind there's a disconnect between dirtying a page and writing it
 to storage.  A page could remain dirty for a long time in the buffer
 cache.  This writing of sequential pages would occur at checkpoint time
 only, which seems the wrong thing to optimize.  If some other process
 needs to evict pages to make room to read some other page in, surely
 it's going to try one page at a time, not write many sequential dirty
 pages.

Well, for a big sequential scan, we use a ring buffer, so we will
typically be evicting the pages that we ourselves read in moments
before.  So in this case we would do a lot of sequential writes of
dirty pages.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-21 Thread Jim Nasby


On 4/21/15 10:04 AM, Bruce Momjian wrote:

One thing to consider is how we handle pruning of index scans that hit
multiple heap pages.  Do we still write X% of the pages in the table, or
%X of the heap pages we actually access via SELECT?  With the
write-then-skip approach, we would do X% of the pages we access, while
with the first-X% approach, we would probably prune all of them as we
would not be accessing most of the table.  I don't think we can do the
first first-X% of pages and have the percentage based on the number of
pages accessed as we have no way to know how many heap pages we will
access from the index.


This comment made me wonder... has anyone considered handing the pruning 
work off to a bgworker, at least for SELECTs? That means the selects 
themselves wouldn't be burdened by the actual prune work, only in 
notifying the bgworker. While that's not going to be free, presumably 
it's a lot cheaper...

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-21 Thread Robert Haas

On Tue, Apr 21, 2015 at 11:04 AM, Bruce Momjian br...@momjian.us wrote:
Yes, it might be too much optimization to try to get the checkpoint to
flush all those pages sequentially, but I was thinking of our current
behavior where, after an update of all rows, we effectively write out
the entire table because we have dirtied every page. I guess with later
prune-based writes, we aren't really writing all the pages as we have
the pattern where pages with prunable content is kind of random. I guess
I was just wondering what value there is to your write-then-skip idea,
vs just writing the first X% of pages we find? Your idea certainly
spreads out the pruning, and doesn't require knowing the size of the
table, though I though that information was easily determined.

One thing to consider is how we handle pruning of index scans that hit
multiple heap pages. Do we still write X% of the pages in the table, or
%X of the heap pages we actually access via SELECT? With the
write-then-skip approach, we would do X% of the pages we access, while
with the first-X% approach, we would probably prune all of them as we
would not be accessing most of the table. I don't think we can do the
first first-X% of pages and have the percentage based on the number of
pages accessed as we have no way to know how many heap pages we will
access from the index. (We would know for bitmap scans, but that
complexity doesn't seem worth it.) That would argue, for consistency
with sequential and index-based heap access, that your approach is best.

I actually implemented something like this for setting hint bits a few
years ago:

http://www.postgresql.org/message-id/aanlktik5qzr8wts0mqcwwmnp-qhgrdky5av5aob7w...@mail.gmail.com
http://www.postgresql.org/message-id/aanlktimgkag7wdu-x77gnv2gh6_qo5ss1u5b6q1ms...@mail.gmail.com

At least in later versions, the patch writes a certain number of
hinted pages, then skips writing a run of pages, then writes another
run of hinted pages. The basic problem here is that, after the fsync
queue compaction patch went in, the benefits on my tests were pretty
modest. Yeah, it costs something to write out lots of dirty pages,
but before the fsync queue compaction stuff, the initial scan of an
unhinted table took like 6x the time on the machine I tested on, but
after that, it was like 1.5x the time. Blunting that spike just
wasn't exciting enough.

It strikes me that it would be better to have an integrated strategy
for this problem. It doesn't make sense to have one strategy for
deciding whether to set hint bits and a separate strategy for deciding
whether to HOT-prune. And if we decide to set hint bits and
HOT-prune, it might be smart to try to mark the page all-visible, too,
if it is and we're not about to update it. I believe we're losing a
lot of performance on OLTP workloads by re-dirtying the same pages
over and over again. We've probably all hit cases where there is an
obvious loss of performance because of this sort of thing, but I'm
starting to think it's hurting us in a lot of less-obvious ways.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Alvaro Herrera

Bruce Momjian wrote:
 On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
  Bruce Momjian wrote:
  
  This seems simple to implement: keep two counters, where the second one
  is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
  reset the first counter so that further 5 pages will get HOT pruned.  5%
  seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
  essentially +infinity.)
 
 This would tend to dirty non-sequential heap pages --- it seems best to
 just clean as many as we are supposed to, then skip the rest, so we can
 write sequential dirty pages to storage.

Keep in mind there's a disconnect between dirtying a page and writing it
to storage.  A page could remain dirty for a long time in the buffer
cache.  This writing of sequential pages would occur at checkpoint time
only, which seems the wrong thing to optimize.  If some other process
needs to evict pages to make room to read some other page in, surely
it's going to try one page at a time, not write many sequential dirty
pages.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  I think the limit has to be in terms of a percentage of the table size. 
  For example, if we do one SELECT on a table with all non-dirty pages, it
  would be good to know that 5% of the pages were pruned --- that tells me
  that another 19 SELECTs will totally prune the table, assuming no future
  writes.
 
 This seems simple to implement: keep two counters, where the second one
 is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
 reset the first counter so that further 5 pages will get HOT pruned.  5%
 seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
 essentially +infinity.)

This would tend to dirty non-sequential heap pages --- it seems best to
just clean as many as we are supposed to, then skip the rest, so we can
write sequential dirty pages to storage.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Robert Haas

On Mon, Apr 20, 2015 at 3:28 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while
 the user waits, which is fundamentally VACUUM's duty to do in the
 background?  If there are a handful of very hot pages, then it makes sense
 not to wait for vacuum to get to them.  And that is what a block-count limit
 does.

I think that's a fundamental mischaracterization of the problem.  As
soon as you define this as vacuum's problem, then of course it makes
no sense to prune in the foreground, ever.  But if you define the
problem as get the best overall system performance, then it clearly
DOES sometimes make sense to prune in the foreground, as benchmark
results upthread demonstrate.

The fact is that on a workload like pgbench - and it doesn't have to
be exactly pgbench, just any kind of workload where there are lots of
changes to the table - vacuum can at any given time be pruning at most
one page of the table.  That is because only one vacuum process can be
running in a given table at one time, and it can't be doing two things
at once.  But there can be many processes doing inserts, updates, or
deletes on that table, as many as whatever you have max_connections
set to.  There can easily be dozens even on a well-configured system;
on a poorly configured system, there could be hundreds.  It seems
obvious that if you can have dozens or hundreds of processes creating
garbage and at most one process cleaning it up, there will be cases
where you get further and further behind.

Now, it might well be that the right solution to that problem is to
allow multiple vacuum processes in the same database, or add
background workers to help with opportunistic HOT-pruning of pages so
it doesn't get done in the foreground.  Fine.  But as of today, on a
heavily-modified table, the ONLY way that we can possibly remove junk
from the table as fast as we're creating junk is if the backends
touching the table do some of the work.  Now, Simon is making the
argument that it should be good enough to have people *modifying* the
table help with the cleanup rather than imposing that load on the
people who are only *reading* it, and that's not a dumb argument, but
there are still cases where that strategy loses - specifically, where
the table churn has stopped or paused, by autovacuum hasn't run yet.
If you're going to do 1 sequential scan of the table and then go home
for the day, HOT-pruning is dumb even in that case.  If you're going
to do 1000 sequential scans of that table in a row, HOT-pruning may
very well be smart.  There's no guarantee that the table has met the
autovacuum threshold, but HOT-pruning it could well be a win anyway.
Or it might be a loss.  You can make any policy here look smart or
dumb by picking a particular workload, and you don't even have to
invent crazy things that will never happen in real life to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Mon, Apr 20, 2015 at 09:56:20PM +0100, Simon Riggs wrote:
 On 20 April 2015 at 20:28, Jeff Janes jeff.ja...@gmail.com wrote:
  
 
 But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job,
 while the user waits, which is fundamentally VACUUM's duty to do in the
 background? 
 
 
 Agreed. I don't see a % as giving us anything at all.
 
 The idea is that we want to turn an O(N) problem for one query into an O(1)
 task.
  
 
 The use case I see for this is when there is a mixed workload.  There is
 one select which reads the entire table, and hundreds of thousands of
 selects/updates/insert that don't, and of course vacuum comes along every
 now and then and does it thing.  Why should the one massive SELECT have
 horrible performance just because it was run right before autovacuum would
 have kicked in instead of right after if finished?
 
 
 +1

You can +1 all you want, but if you ignore the specific workloads I
mentioned, you are not going to get much traction.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Simon Riggs

On 20 April 2015 at 20:28, Jeff Janes jeff.ja...@gmail.com wrote:


 But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job,
 while the user waits, which is fundamentally VACUUM's duty to do in the
 background?


Agreed. I don't see a % as giving us anything at all.

The idea is that we want to turn an O(N) problem for one query into an O(1)
task.


 The use case I see for this is when there is a mixed workload.  There is
 one select which reads the entire table, and hundreds of thousands of
 selects/updates/insert that don't, and of course vacuum comes along every
 now and then and does it thing.  Why should the one massive SELECT have
 horrible performance just because it was run right before autovacuum would
 have kicked in instead of right after if finished?


+1

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Jeff Janes

On Mon, Sep 29, 2014 at 2:13 AM, Andres Freund and...@anarazel.de wrote:

 On 2014-09-28 19:51:36 +0100, Simon Riggs wrote:
  On 27 September 2014 09:29, Andres Freund and...@anarazel.de wrote:
   On 2014-09-27 10:23:33 +0300, Heikki Linnakangas wrote:
   This patch has gotten a fair amount of review, and has been
rewritten once
   during the commitfest. I think it's pretty close to being
committable, the
   only remaining question seems to be what to do with system catalogs.
I'm
   marking this as Returned with feedback, I take it that Simon can
proceed
   from here, outside the commitfest.
  
   FWIW, I don't think it is, even with that. As is it seems very likely
   that it's going to regress a fair share of workloads. At the very
least
   it needs a fair amount of benchmarking beforehand.
 
  There is some doubt there. We've not seen a workload that does
  actually exhibit a negative behaviour.

 Neither is there much data about the magnitude of positive effect the
 patch has...

  I'm not saying one doesn't exist, but it does matter how common/likely
  it is. If anyone can present a performance test case that demonstrates
  a regression, I think it will make it easier to discuss how wide that
  case is and what we should do about it. Discussing whether to do
  various kinds of limited pruning are moot until that is clear.

 I doubt it'll be hard to construct a case where it'll show. My first try
 of using a pgbench scale 100, -M prepared, -cj8 with a custom file with
 1 write and 5 read transaction yielded the following on my laptop:

 Baseline:
  relname| pgbench_tellers
  pg_total_relation_size | 458752
  relname| pgbench_accounts
  pg_total_relation_size | 1590337536
  relname| pgbench_branches
  pg_total_relation_size | 286720
  relname| pgbench_history
  pg_total_relation_size | 49979392
 Patched:
  relname| pgbench_tellers
  pg_total_relation_size | 516096
  relname| pgbench_accounts
  pg_total_relation_size | 1590337536
  relname| pgbench_branches
  pg_total_relation_size | 360448
  relname| pgbench_history
  pg_total_relation_size | 49528832

 So, there's a noticeable increase in size. Mostly on the smaller tables,
 so probably HOT cleanup was sometimes skipped during UPDATEs due to
 locks.

 Baseline was:
 tps = 9655.486532 (excluding connections establishing)
 Patched was:
 tps = 9466.158701 (including connections establishing)

Was this reproducible?  I've run your custom sql file with 4 clients (that
is how many CPUs I have) on a machine
with a BBU.  I had wal_level = hot_standby, but the archive_command just
returned true without archiving anything. And using the latest patch.

The size of the pgbench_tellers and pgbench_branches relations were
surprisingly variable in both patched and unpatched, but there was no
reliable difference between them, just within them.

On the TPS front, there was a hint that patched one was slightly slower but
the within sample variation was also high, and the p-val for difference was
only 0.214 on n of 66.

test case attached.

 That's not a unrealistic testcase.

 I'm pretty sure this could be made quite a bit more pronounced by not
 using a uniform distribution in the pgbench runs. And selecting a test
 that's more vulnerable to the change (e.g. using a wider distribution
 for the read only statements than the modifying ones) would make the the
 CPU overhead of the additional heap_hot_search_buffer() overhead
 heavier.

Sorry I don't understand this description.  Why would queries selecting
data that is not changing have any extra overhead?

Is the idea that the hot part of the table for updates would move around
over time, but the hot part for selects would be even throughout?  I'm not
sure how to put that to the test.

Cheers,

Jeff

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Andres Freund

On 2015-04-20 01:04:18 -0700, Jeff Janes wrote:
 Was this reproducible?

Yes, at least with an old version of the patch.

I don't think you could see a difference using exactly that with the
newer versions which have the 5 page limit. After all it'll pretty much
never reach it.

  That's not a unrealistic testcase.
 
  I'm pretty sure this could be made quite a bit more pronounced by not
  using a uniform distribution in the pgbench runs. And selecting a test
  that's more vulnerable to the change (e.g. using a wider distribution
  for the read only statements than the modifying ones) would make the the
  CPU overhead of the additional heap_hot_search_buffer() overhead
  heavier.

 Sorry I don't understand this description.  Why would queries selecting
 data that is not changing have any extra overhead?

The idea, I think, was that by having a uniform (or just wider)
distribution of the reads they'd be more likely to land on values that
have been updated at some point, but not been pruned since (because at
that point the patch IIRC didn't prune during reads at all). I.e. ones
wer

 Is the idea that the hot part of the table for updates would move around
 over time, but the hot part for selects would be even throughout?

Pretty much.

 I'm not sure how to put that to the test.

That pretty much was what I'd tried to model, yea. I guess it'd be
possible to model this by inserting NOW()/updating values NOW() - 5 and
selecting values up to NOW() - 60. That'd roughly model some realistic
insert/update/select patterns I've seen.

To possibly see any difference with the new patch this would have to be
done in a way that regularly a couple of pages would be touched, with
not that many selected tuples on each.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Simon Riggs

On 20 April 2015 at 18:33, Bruce Momjian br...@momjian.us wrote:


 Also, I am also not sure we should be designing features at this stage
 in our release process.


I see this more as a process of gaining approval. I don't think patches at
the back of the queue should get the its too late treatment just because
they are at the back of the queue. They are logically all at the same
stage. There should be a way to allow people that show patience and respect
for the process to get the same timeshare as those that push their patches
daily.

Anyway, in this case, the patch conflicts with other things going in now,
so changing things isn't really sensible for this release, in terms of my
time.

We clearly need to do a better job of piggybacking actions on a dirty
block. The discussion has been about whether to do early cleanup, but we
should also consider post-access cleanup if we set hint bits or dirtied the
block in other ways.

Since we have many votes in favour of change in this area I'll post a new
version and look for an early review/commit for next release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Thu, Apr 16, 2015 at 03:41:54PM +0100, Simon Riggs wrote:
 That is how we arrive at the idea of a cleanup limit, further enhanced by a
 limit that applies only to dirtying clean blocks, which we have 4? recent 
 votes
 in favour of.
 
 I would personally be in favour of a parameter to control the limit, since
 whatever we chose is right/wrong depending upon circumstances. I am however
 comfortable with not having a parameter if people think it is hard to tune
 that, which I agree it would be, hence no parameter in the patch.

I think the limit has to be in terms of a percentage of the table size. 
For example, if we do one SELECT on a table with all non-dirty pages, it
would be good to know that 5% of the pages were pruned --- that tells me
that another 19 SELECTs will totally prune the table, assuming no future
writes.  If there are future writes, they would dirty the pages and
cause even more pruning, but the 5% gives me the maximum pruning number
of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
is pruned or not?  Probably not.  Measuring in page count doesn't do
that, and a large table could receive millions of selects before being
fully cleaned.

Also, I am also not sure we should be designing features at this stage
in our release process.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Alvaro Herrera

Bruce Momjian wrote:

 I think the limit has to be in terms of a percentage of the table size. 
 For example, if we do one SELECT on a table with all non-dirty pages, it
 would be good to know that 5% of the pages were pruned --- that tells me
 that another 19 SELECTs will totally prune the table, assuming no future
 writes.

This seems simple to implement: keep two counters, where the second one
is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
reset the first counter so that further 5 pages will get HOT pruned.  5%
seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
essentially +infinity.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Mon, Apr 20, 2015 at 04:19:22PM -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  I think the limit has to be in terms of a percentage of the table size. 
  For example, if we do one SELECT on a table with all non-dirty pages, it
  would be good to know that 5% of the pages were pruned --- that tells me
  that another 19 SELECTs will totally prune the table, assuming no future
  writes.
 
 This seems simple to implement: keep two counters, where the second one
 is pages we skipped cleanup in.  Once that counter hits SOME_MAX_VALUE,
 reset the first counter so that further 5 pages will get HOT pruned.  5%
 seems a bit high though.  (In Simon's design, SOME_MAX_VALUE is
 essentially +infinity.)

Oh, I pulled 5% out of the air.  Thinking of a SELECT-only workload,
which would be our worse case, I was thinking how many SELECTS running
through HOT update chains would it take to be slower than generating the
WAL to prune the page.  I see the percentage as something that we could
reasonably balance, while a fixed page count couldn't be analyzed in
that way.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-20 Thread Jeff Janes

On Mon, Apr 20, 2015 at 10:33 AM, Bruce Momjian br...@momjian.us wrote:

 On Thu, Apr 16, 2015 at 03:41:54PM +0100, Simon Riggs wrote:
  That is how we arrive at the idea of a cleanup limit, further enhanced
 by a
  limit that applies only to dirtying clean blocks, which we have 4?
 recent votes
  in favour of.
 
  I would personally be in favour of a parameter to control the limit,
 since
  whatever we chose is right/wrong depending upon circumstances. I am
 however
  comfortable with not having a parameter if people think it is hard to
 tune
  that, which I agree it would be, hence no parameter in the patch.

 I think the limit has to be in terms of a percentage of the table size.
 For example, if we do one SELECT on a table with all non-dirty pages, it
 would be good to know that 5% of the pages were pruned --- that tells me
 that another 19 SELECTs will totally prune the table, assuming no future
 writes.


But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job,
while the user waits, which is fundamentally VACUUM's duty to do in the
background?  If there are a handful of very hot pages, then it makes sense
not to wait for vacuum to get to them.  And that is what a block-count
limit does.

But if the entire table is very hot, I think that that is just another of
way of saying that autovacuum is horribly misconfigured.  I think the
purpose of this patch is to fix something that can't be fixed through
configuration alone.


 If there are future writes, they would dirty the pages and
 cause even more pruning, but the 5% gives me the maximum pruning number
 of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
 is pruned or not?


The use case I see for this is when there is a mixed workload.  There is
one select which reads the entire table, and hundreds of thousands of
selects/updates/insert that don't, and of course vacuum comes along every
now and then and does it thing.  Why should the one massive SELECT have
horrible performance just because it was run right before autovacuum would
have kicked in instead of right after if finished?

Cheers,

Jeff

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On Mon, Apr 20, 2015 at 12:28:11PM -0700, Jeff Janes wrote:
 But why should 1 SELECT or 20 SELECTs or 200 SELECTs have to do a job, while
 the user waits, which is fundamentally VACUUM's duty to do in the background? 
 If there are a handful of very hot pages, then it makes sense not to wait for
 vacuum to get to them.  And that is what a block-count limit does.  
 
 But if the entire table is very hot, I think that that is just another of way
 of saying that autovacuum is horribly misconfigured.  I think the purpose of

Well, we have to assume there are many misconfigured configurations ---
autovacuum isn't super-easy to configure, so we can't just blame the
user if this makes things worse.  In fact, page pruning was designed
spefically for cases where autovacuum wasn't running our couldn't keep
up.

 this patch is to fix something that can't be fixed through configuration 
 alone.
  
 
 If there are future writes, they would dirty the pages and
 cause even more pruning, but the 5% gives me the maximum pruning number
 of SELECTs.  If there aren't another 19 SELECTs, do I care if the table
 is pruned or not? 
 
 
 The use case I see for this is when there is a mixed workload.  There is one
 select which reads the entire table, and hundreds of thousands of selects/
 updates/insert that don't, and of course vacuum comes along every now and then
 and does it thing.  Why should the one massive SELECT have horrible 
 performance
 just because it was run right before autovacuum would have kicked in instead 
 of
 right after if finished?

I see your point, but what about the read-only workload after a big
update?  Do we leave large tables to be non-pruned for a long time? 
Also, consider cases where you did a big update, the autovacuum
thresh-hold was not met, so autovacuum doesn't run on that table ---
again, do we keep those non-pruned rows around for millions of scans?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Greg Stark

On 15 Apr 2015 15:43, Simon Riggs si...@2ndquadrant.com wrote:

 It all depends upon who is being selfish. Why is a user selfish for
 not wanting to clean every single block they scan, when the people
 that made the mess do nothing and go faster 10 minutes from now?
 Randomly and massively penalising large SELECTs makes no sense. Some
 cleanup is OK, with reasonable limits, which is why that is proposed.

I don't think it's productive to think of a query as a different actor with
only an interest in its own performance and no interest in overall system
performance.

From a holistic point of view the question is how many times is a given hit
chain going to need to be followed before it's pruned. Or to put it another
way, how expensive is creating a hot chain. Does it cause a single prune? a
fixed number of chain readers followed by a prune? Does the amount of work
depend on the workload or is it consistent?

My intuition is that a fixed cutoff like five pages is dangerous because
if you update many pages there's no limit to the number of times they'll be
read before they're all pruned. The steady state could easily be that every
query is having to read hot chains forever.

My intuition, again, is that what we need is a percentage such as do 10
prunes then ignore the next 1000 clean pages with hot chains. That
guarantees that after 100 selects the hot chains will all be pruned but
each select will only prune 1% of the clean pages it sees.

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Pavan Deolasee

On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark st...@mit.edu wrote:


 On 15 Apr 2015 15:43, Simon Riggs si...@2ndquadrant.com wrote:
 
  It all depends upon who is being selfish. Why is a user selfish for
  not wanting to clean every single block they scan, when the people
  that made the mess do nothing and go faster 10 minutes from now?
  Randomly and massively penalising large SELECTs makes no sense. Some
  cleanup is OK, with reasonable limits, which is why that is proposed.

 I don't think it's productive to think of a query as a different actor
 with only an interest in its own performance and no interest in overall
 system performance.

 From a holistic point of view the question is how many times is a given
 hit chain going to need to be followed before it's pruned. Or to put it
 another way, how expensive is creating a hot chain. Does it cause a single
 prune? a fixed number of chain readers followed by a prune? Does the amount
 of work depend on the workload or is it consistent?


IMO the size or traversal of the HOT chain is not that expensive compared
to the cost of either pruning too frequently, which generates WAL as well
as makes buffers dirty. OTOH cost of less frequent pruning could also be
very high. It can cause severe table bloat which may just stay for a very
long time. Even if dead space is recovered within a page, truncating a
bloated heap is not always possible. In such cases, even SELECTs would be
slowed down just because they need to read/scan far more pages than they
otherwise would have. IOW its probably wrong to assume that not-pruning
quickly enough will have impact only on the non-SELECT queries.

I also concur with arguments upthread that this change needs to be
carefully calibrated because it can lead to significant degradation for
certain workloads.

 My intuition, again, is that what we need is a percentage such as do 10
 prunes then ignore the next 1000 clean pages with hot chains. That
 guarantees that after 100 selects the hot chains will all be pruned but
 each select will only prune 1% of the clean pages it sees.

I think some such proposal was made in the last. There could be knob to
control how much a read-only query (or may be a read-only transaction)
should do HOT cleanup, say as a percentage of pages it looks at. The
default can be left at 100% in the first release so that the current
behaviour is not suddenly disrupted. But it will allow others to play with
the percentages and then based on field reports, we can change defaults in
the next releases.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Alvaro Herrera

Pavan Deolasee wrote:
 On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark st...@mit.edu wrote:

  From a holistic point of view the question is how many times is a given
  hit chain going to need to be followed before it's pruned. Or to put it
  another way, how expensive is creating a hot chain. Does it cause a single
  prune? a fixed number of chain readers followed by a prune? Does the amount
  of work depend on the workload or is it consistent?
 
 IMO the size or traversal of the HOT chain is not that expensive compared
 to the cost of either pruning too frequently, which generates WAL as well
 as makes buffers dirty. OTOH cost of less frequent pruning could also be
 very high. It can cause severe table bloat which may just stay for a very
 long time. Even if dead space is recovered within a page, truncating a
 bloated heap is not always possible.

I think you're failing to consider that in the patch there is a
distinction between read-only page accesses and page updates.  During a
page update, HOT cleanup is always done even with the patch, so there
won't be any additional bloat that would not be there without the patch.
It's only the read-only accesses to the patch that skip the HOT pruning.

Of course, as Greg says there will be some additional scans of the HOT
chain by read-only processes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Andres Freund

On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
 I think you're failing to consider that in the patch there is a
 distinction between read-only page accesses and page updates.  During a
 page update, HOT cleanup is always done even with the patch, so there
 won't be any additional bloat that would not be there without the
 patch.

That's not really true (and my benchmark upthread proves it). The fact
that hot pruning only happens when we can get a cleanup lock means that
we can end up with more pages that are full, if we prune on select less
often. Especially if SELECTs are more frequent than write accesses -
pretty darn common - the likelihood of SELECTs getting the lock is
correspondingly higher.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Pavan Deolasee

On Thu, Apr 16, 2015 at 6:50 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Pavan Deolasee wrote:
  On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark st...@mit.edu wrote:

   From a holistic point of view the question is how many times is a given
   hit chain going to need to be followed before it's pruned. Or to put it
   another way, how expensive is creating a hot chain. Does it cause a
 single
   prune? a fixed number of chain readers followed by a prune? Does the
 amount
   of work depend on the workload or is it consistent?
 
  IMO the size or traversal of the HOT chain is not that expensive compared
  to the cost of either pruning too frequently, which generates WAL as well
  as makes buffers dirty. OTOH cost of less frequent pruning could also be
  very high. It can cause severe table bloat which may just stay for a very
  long time. Even if dead space is recovered within a page, truncating a
  bloated heap is not always possible.

 I think you're failing to consider that in the patch there is a
 distinction between read-only page accesses and page updates.  During a
 page update, HOT cleanup is always done even with the patch, so there
 won't be any additional bloat that would not be there without the patch.
 It's only the read-only accesses to the patch that skip the HOT pruning.


Ah, Ok. I'd not read the patch. But now that I do, I feel much more
comfortable with the change. In fact, I wonder if its just enough to either
do full HOT prune for target relations and not at all for all other
relations involved in the query. My apologies if this is done based on
discussions upthread. I haven't read the entire thread yet.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Alvaro Herrera

Andres Freund wrote:
 On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
  I think you're failing to consider that in the patch there is a
  distinction between read-only page accesses and page updates.  During a
  page update, HOT cleanup is always done even with the patch, so there
  won't be any additional bloat that would not be there without the
  patch.
 
 That's not really true (and my benchmark upthread proves it). The fact
 that hot pruning only happens when we can get a cleanup lock means that
 we can end up with more pages that are full, if we prune on select less
 often. Especially if SELECTs are more frequent than write accesses -
 pretty darn common - the likelihood of SELECTs getting the lock is
 correspondingly higher.

Interesting point.  Of course, this code should count HOT cleanups
against the total limit when they are effectively carried out, and
ignore those that are skipped because of inability to acquire the
cleanup lock.  Not sure whether the submitted code does that.

Can we keep stats on how many pages we don't clean in the updating
process due to failure to acquire cleanup lock?  My intuition says that
it should be similar to the number of backends running concurrently,
but that might be wrong.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-16 Thread Simon Riggs

On 16 April 2015 at 15:21, Andres Freund and...@anarazel.de wrote:

 On 2015-04-16 10:20:20 -0300, Alvaro Herrera wrote:
  I think you're failing to consider that in the patch there is a
  distinction between read-only page accesses and page updates.  During a
  page update, HOT cleanup is always done even with the patch, so there
  won't be any additional bloat that would not be there without the
  patch.

 That's not really true (and my benchmark upthread proves it). The fact
 that hot pruning only happens when we can get a cleanup lock means that
 we can end up with more pages that are full, if we prune on select less
 often. Especially if SELECTs are more frequent than write accesses -
 pretty darn common - the likelihood of SELECTs getting the lock is
 correspondingly higher.


Your point that we *must* do *some* HOT cleanup on SELECTs is proven beyond
question. Alvaro has not disputed that, ISTM you misread that. Pavan has
questioned that point but the results upthread are there, he explains he
hasn't read that yet.

The only question is how much cleanup on SELECT? Having one SELECT hit
10,000 cleanups while another hits 0 creates an unfairness and
unpredictability in the way we work. Maybe some people running a backup
actually like the fact it cleans the database; others think that is a bad
thing. Few people issuing large queries think it is good behaviour. Anybody
running replication also knows that this causes a huge slam of WAL which
can increase replication delay, which is a concern for HA.

That is how we arrive at the idea of a cleanup limit, further enhanced by a
limit that applies only to dirtying clean blocks, which we have 4? recent
votes in favour of.

I would personally be in favour of a parameter to control the limit, since
whatever we chose is right/wrong depending upon circumstances. I am however
comfortable with not having a parameter if people think it is hard to tune
that, which I agree it would be, hence no parameter in the patch.

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 15 April 2015 at 08:04, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 14 April 2015 at 21:53, Robert Haas robertmh...@gmail.com wrote:
 Peter commented previously that README.HOT should get an update.  The
 relevant section seems to be When can/should we prune or
 defragment?.

 That's easy enough to change once we agree to commit.

 I wonder if it would be a useful heuristic to still prune pages if
 those pages are already dirty.

 Useful for who? This is about responsibility. Why should someone
 performing a large SELECT take the responsibility for cleaning pages?

 Because it makes it subsequent accesses to the page cheaper.

Cheaper for whom?

 Of
 course, that applies in all cases, but when the page is already dirty,
 the cost of pruning it is probably quite small - we're going to have
 to write the page anyway, and pruning it before it gets evicted
 (perhaps even by our scan) will be cheaper than writing it now and
 writing it again after it's pruned.  When the page is clean, the cost
 of pruning is significantly higher.

We aren't going to have to write the page, but someone will.

In a single workload, the mix of actions can be useful. In separate
workloads, where some guy just wants to run a report or a backup, its
not right that we slow them down because of someone else's actions.

 I won't take responsibility for paying my neighbor's tax bill, but I
 might take responsibility for picking up his mail while he's on
 holiday.

That makes it sound like this is an occasional, non-annoying thing.

It's more like, whoever fetches the mail needs to fetch it for
everybody. So we are slowing down one person disproportionately, while
others fly through without penalty. There is no argument that one
workload necessarily needs to perform that on behalf of the other
workload.

The actions you suggest are reasonable and should ideally be the role
of a background process. But that doesn't mean in the absence of that
we should pay the cost in the foreground.

Let me apply this patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote:
 On 2015-04-15 08:42:33 -0400, Simon Riggs wrote:
  Because it makes it subsequent accesses to the page cheaper.

 Cheaper for whom?

 Everyone.

I think what you mean is Everyone else. It is demonstrably quicker
and more consistent for a process when it limits the amount of pruning
it does, as well as the fact that it causes additional WAL traffic
when it does so, causing replication lag.

I love it when someone cleans up for me. I just don't think they'll
accept the argument that they should clean up for me because it makes
their life easier.   Certainly doesn't work with my kids.


 I don't really see the downside to this suggestion.

The suggestion makes things better than they are now but is still less
than I have proposed.

If what you both mean is IMHO this is an acceptable compromise, I
can accept it also, at this point in the CF.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Robert Haas

On Wed, Apr 15, 2015 at 3:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 14 April 2015 at 21:53, Robert Haas robertmh...@gmail.com wrote:
 Peter commented previously that README.HOT should get an update.  The
 relevant section seems to be When can/should we prune or
 defragment?.

 That's easy enough to change once we agree to commit.

 I wonder if it would be a useful heuristic to still prune pages if
 those pages are already dirty.

 Useful for who? This is about responsibility. Why should someone
 performing a large SELECT take the responsibility for cleaning pages?

Because it makes it subsequent accesses to the page cheaper.  Of
course, that applies in all cases, but when the page is already dirty,
the cost of pruning it is probably quite small - we're going to have
to write the page anyway, and pruning it before it gets evicted
(perhaps even by our scan) will be cheaper than writing it now and
writing it again after it's pruned.  When the page is clean, the cost
of pruning is significantly higher.

I won't take responsibility for paying my neighbor's tax bill, but I
might take responsibility for picking up his mail while he's on
holiday.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Andres Freund

On 2015-04-15 08:42:33 -0400, Simon Riggs wrote:
  Because it makes it subsequent accesses to the page cheaper.
 
 Cheaper for whom?

Everyone. Including further readers. Following HOT chains in read mostly
workloads can be really expensive. If you have workloads with a 'hot'
value range that's frequently updated, but that range moves you can
easily end up with heavily chained tuples which won't soon be touched by
a writer again.

And writers will often not yet be able to prune the page because there's
still live readers for the older versions (like other updaters).

  Of
  course, that applies in all cases, but when the page is already dirty,
  the cost of pruning it is probably quite small - we're going to have
  to write the page anyway, and pruning it before it gets evicted
  (perhaps even by our scan) will be cheaper than writing it now and
  writing it again after it's pruned.  When the page is clean, the cost
  of pruning is significantly higher.
 
 We aren't going to have to write the page, but someone will.

If it's already dirty that doesn't change at all. *Not* pruning in that
moment actually will often *increase* the total amount of writes to the
OS. Because now the pruning will happen on the next write access or
vacuum - when the page already might have been undirtied.

I don't really see the downside to this suggestion.

 The actions you suggest are reasonable and should ideally be the role
 of a background process. But that doesn't mean in the absence of that
 we should pay the cost in the foreground.

I'm not sure that's true. A background process will either cause
additional read IO to find worthwhile pages, or it'll not find
worthwhile pages because they're already paged out.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Alvaro Herrera

Simon Riggs wrote:
 On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote:

  I don't really see the downside to this suggestion.
 
 The suggestion makes things better than they are now but is still less
 than I have proposed.
 
 If what you both mean is IMHO this is an acceptable compromise, I
 can accept it also, at this point in the CF.

Let me see if I understand things.

What we have now is: when reading a page, we also HOT-clean it.  This
runs HOT-cleanup a large number of times, and causes many pages to
become dirty.

Your patch is when reading a page, HOT-clean it, but only 5 times in
each scan.  This runs HOT-cleanup at most 5 times, and causes at most 5
pages to become dirty.

Robert's proposal is when reading a page, if dirty HOT-clean it; if not
dirty, also HOT-clean it but only 5 times in each scan.  This runs
HOT-cleanup some number of times (as many as there are dirty), and
causes at most 5 pages to become dirty.


Am I right in thinking that HOT-clean in a dirty page is something that
runs completely within CPU cache?  If so, it would be damn fast and
would have benefits for future readers, for very little cost.

Dirtying a page is very different; if buffer reads are common, the
system is later bogged down trying to find clean pages to read uncached
buffers (including the read-only scan itself, so it becomes slower.)


If I have understood things correctly, then I stand behind Robert's
suggestion.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Heikki Linnakangas


On 04/15/2015 05:44 PM, Alvaro Herrera wrote:

Simon Riggs wrote:

On 15 April 2015 at 09:10, Andres Freund and...@anarazel.de wrote:



I don't really see the downside to this suggestion.


The suggestion makes things better than they are now but is still less
than I have proposed.

If what you both mean is IMHO this is an acceptable compromise, I
can accept it also, at this point in the CF.


Let me see if I understand things.

What we have now is: when reading a page, we also HOT-clean it.  This
runs HOT-cleanup a large number of times, and causes many pages to
become dirty.

Your patch is when reading a page, HOT-clean it, but only 5 times in
each scan.  This runs HOT-cleanup at most 5 times, and causes at most 5
pages to become dirty.

Robert's proposal is when reading a page, if dirty HOT-clean it; if not
dirty, also HOT-clean it but only 5 times in each scan.  This runs
HOT-cleanup some number of times (as many as there are dirty), and
causes at most 5 pages to become dirty.


Am I right in thinking that HOT-clean in a dirty page is something that
runs completely within CPU cache?  If so, it would be damn fast and
would have benefits for future readers, for very little cost.


If there are many tuples on the page, it takes some CPU effort to scan 
all the HOT chains and move tuples around. Also, it creates a WAL 
record, which isn't free.


Another question is whether the patch can reliably detect whether it's 
doing a read-only scan or not. I haven't tested, but I suspect it'd 
not do pruning when you do something like INSERT INTO foo SELECT * FROM 
foo WHERE blah. I.e. when the target relation is referenced twice in 
the same statement: once as the target, and second time as a source. 
Maybe that's OK, though.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Alvaro Herrera

Heikki Linnakangas wrote:
 On 04/15/2015 05:44 PM, Alvaro Herrera wrote:

 Robert's proposal is when reading a page, if dirty HOT-clean it; if not
 dirty, also HOT-clean it but only 5 times in each scan.  This runs
 HOT-cleanup some number of times (as many as there are dirty), and
 causes at most 5 pages to become dirty.
 
 
 Am I right in thinking that HOT-clean in a dirty page is something that
 runs completely within CPU cache?  If so, it would be damn fast and
 would have benefits for future readers, for very little cost.
 
 If there are many tuples on the page, it takes some CPU effort to scan all
 the HOT chains and move tuples around. Also, it creates a WAL record, which
 isn't free.

But if the page is in CPU cache, the CPU effort shouldn't be all that
noticeable, should it?  That's my point, but then maybe I'm wrong.  Now,
the WAL logging is annoying, so let's limit that too -- do it at most
for, say, 20 dirty pages and at most 5 clean pages.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Robert Haas

On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I won't take responsibility for paying my neighbor's tax bill, but I
 might take responsibility for picking up his mail while he's on
 holiday.

 That makes it sound like this is an occasional, non-annoying thing.

 It's more like, whoever fetches the mail needs to fetch it for
 everybody. So we are slowing down one person disproportionately, while
 others fly through without penalty. There is no argument that one
 workload necessarily needs to perform that on behalf of the other
 workload.

Sure there is.  It's called a tragedy of the commons - everybody acts
in their own selfish interest (it's not *my* responsibility to limit
grazing on public land, or prune this page that I'm not modifying) and
as a result some resource that everybody cares about (grass,
system-wide I/O) gets trashed to everyone's detriment.  Purely selfish
behavior can only be justified here if we assume that the selfish
actor intends to participate in the system only once: I'm going to run
one big reporting query which must run as fast as possible, and then
I'm getting on a space ship to Mars.  So if my refusal to do any
pruning during that reporting query causes lots of extra I/O on the
system ten minutes from now, I don't care, because I'll have left the
playing field forever at that point.

As Heikki points out, any HOT pruning operation generates WAL and has
a CPU cost.  However, pruning a page that is currently dirty
*decreases* the total volume of writes to the data files, whereas
pruning a page that is currently clean *increases* the total volume of
writes to the data files.  In the first case, if we prune the page
right now while it's still dirty, we can't possibly cause any
additional data-file writes, and we may save one, because it's
possible that someone else would later have pruned it when it was
clean and there was no other reason to dirty it.  In the second case,
if we prune the page that is currently clean, it will become dirty.
That will cost us no additional I/O if the page is again modified
before it's written out, but otherwise it costs an additional data
file write.  I think there's a big difference between those two cases.
Sure, from the narrow point of view of how much work it takes this
scan to process this page, it's always better not to prune.  But if
you make the more realistic assumption that you will keep on issuing
queries on the system, then what you're doing to the overall system
I/O load is pretty important.

By the way, was anything ever done about this:

http://www.postgresql.org/message-id/20140929091343.ga4...@alap3.anarazel.de

That's just a workload that is 5/6th pgbench -S and 1/6th pgbench,
which is in no way an unrealistic workload, and showed a significant
regression with an earlier version of the patch.  You seem very eager
to commit this patch after four months of inactivity, but I think this
is a pretty massive behavior change that deserves careful scrutiny
before it goes in.  If we push something that changes longstanding
behavior and can't even be turned off, and it regresses behavior for a
use case that common, our users are going to come after us with
pitchforks.  That's not to say some people won't be happy, but in my
experience it takes a lot of happy users to make up for getting
stabbed with even one pitchfork.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Heikki Linnakangas


On 04/15/2015 07:11 PM, Alvaro Herrera wrote:

Heikki Linnakangas wrote:

On 04/15/2015 05:44 PM, Alvaro Herrera wrote:



Robert's proposal is when reading a page, if dirty HOT-clean it; if not
dirty, also HOT-clean it but only 5 times in each scan.  This runs
HOT-cleanup some number of times (as many as there are dirty), and
causes at most 5 pages to become dirty.


Am I right in thinking that HOT-clean in a dirty page is something that
runs completely within CPU cache?  If so, it would be damn fast and
would have benefits for future readers, for very little cost.


If there are many tuples on the page, it takes some CPU effort to scan all
the HOT chains and move tuples around. Also, it creates a WAL record, which
isn't free.


But if the page is in CPU cache, the CPU effort shouldn't be all that
noticeable, should it?  That's my point, but then maybe I'm wrong.  Now,
the WAL logging is annoying, so let's limit that too -- do it at most
for, say, 20 dirty pages and at most 5 clean pages.


There isn't much difference between that and just doing it on first 5 
pages. Both of those numbers were pulled out of thin air, anyway. I'd 
rather just keep it simple.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Apr 15, 2015 at 8:42 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I won't take responsibility for paying my neighbor's tax bill, but I
 might take responsibility for picking up his mail while he's on
 holiday.

 That makes it sound like this is an occasional, non-annoying thing.

 It's more like, whoever fetches the mail needs to fetch it for
 everybody. So we are slowing down one person disproportionately, while
 others fly through without penalty. There is no argument that one
 workload necessarily needs to perform that on behalf of the other
 workload.

 Sure there is.  It's called a tragedy of the commons - everybody acts
 in their own selfish interest (it's not *my* responsibility to limit
 grazing on public land, or prune this page that I'm not modifying) and
 as a result some resource that everybody cares about (grass,
 system-wide I/O) gets trashed to everyone's detriment.  Purely selfish
 behavior can only be justified here if we assume that the selfish
 actor intends to participate in the system only once: I'm going to run
 one big reporting query which must run as fast as possible, and then
 I'm getting on a space ship to Mars.  So if my refusal to do any
 pruning during that reporting query causes lots of extra I/O on the
 system ten minutes from now, I don't care, because I'll have left the
 playing field forever at that point.

It all depends upon who is being selfish. Why is a user selfish for
not wanting to clean every single block they scan, when the people
that made the mess do nothing and go faster 10 minutes from now?
Randomly and massively penalising large SELECTs makes no sense. Some
cleanup is OK, with reasonable limits, which is why that is proposed.

On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
 Robert's proposal is when reading a page, if dirty HOT-clean it; if not
 dirty, also HOT-clean it but only 5 times in each scan.  This runs
 HOT-cleanup some number of times (as many as there are dirty), and
 causes at most 5 pages to become dirty.

My understanding of Robert's proposal was when reading a page,
HOT-clean it, but only do this up to 5 times on clean pages, but
continue to do this indefinitely when the page is already dirty..
Andres said that was the only way and I have agreed to it.

Are you now saying not to commit your proposal at all?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Alvaro Herrera

Simon Riggs wrote:
 On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote:

 On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
  Robert's proposal is when reading a page, if dirty HOT-clean it; if not
  dirty, also HOT-clean it but only 5 times in each scan.  This runs
  HOT-cleanup some number of times (as many as there are dirty), and
  causes at most 5 pages to become dirty.
 
 My understanding of Robert's proposal was when reading a page,
 HOT-clean it, but only do this up to 5 times on clean pages, but
 continue to do this indefinitely when the page is already dirty..

To me, both statements look identical.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

On 15 April 2015 at 16:01, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Simon Riggs wrote:
 On 15 April 2015 at 12:39, Robert Haas robertmh...@gmail.com wrote:

 On 04/15/2015 05:44 PM, Alvaro Herrera wrote:
  Robert's proposal is when reading a page, if dirty HOT-clean it; if not
  dirty, also HOT-clean it but only 5 times in each scan.  This runs
  HOT-cleanup some number of times (as many as there are dirty), and
  causes at most 5 pages to become dirty.

 My understanding of Robert's proposal was when reading a page,
 HOT-clean it, but only do this up to 5 times on clean pages, but
 continue to do this indefinitely when the page is already dirty..

 To me, both statements look identical.

I didn't read it that way, apologies for any confusion. But that is good news.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes

2015-04-15 Thread Peter Geoghegan

On Wed, Apr 15, 2015 at 6:10 AM, Andres Freund and...@anarazel.de wrote:
  Of
  course, that applies in all cases, but when the page is already dirty,
  the cost of pruning it is probably quite small - we're going to have
  to write the page anyway, and pruning it before it gets evicted
  (perhaps even by our scan) will be cheaper than writing it now and
  writing it again after it's pruned.  When the page is clean, the cost
  of pruning is significantly higher.

 We aren't going to have to write the page, but someone will.

 If it's already dirty that doesn't change at all. *Not* pruning in that
 moment actually will often *increase* the total amount of writes to the
 OS. Because now the pruning will happen on the next write access or
 vacuum - when the page already might have been undirtied.

 I don't really see the downside to this suggestion.

+1. I think, in general, the opportunity cost of not pruning when a
page is already dirty is likely to be rather high. In general, it's
likely to be worth it.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Turning off HOT/Cleanup sometimes