Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-12 Thread Robert Treat
On Friday 10 November 2006 08:53, Simon Riggs wrote:
 On Fri, 2006-11-10 at 12:32 +0100, Zeugswetter Andreas ADI SD wrote:
  4. although at first it might seem so I see no advantage for vacuum with
  overflow

 No need to VACUUM the indexes, which is the most expensive part. The
 more indexes you have, the more VACUUM costs, not so with HOT.


This isn't exactly true though right?  Since the more indexes you have, the 
more likely it is that your updating an indexed column, which means HOT isn't 
going to work for you.  One common use case that seems problematic is the 
indexed, frequently updated timestamp field.

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Simon Riggs
On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  As more UPDATEs take place these tuple chains would grow, making
  locating the latest tuple take progressively longer.
 
 This is the part that bothers me --- particularly the random-access
 nature of the search.  I wonder whether you couldn't do something
 involving an initial table fill-factor of less than 50%, and having
 the first updated version living on the same heap page as its parent.
 Only when the active chain length was more than one (which you
 hypothesize is rare) would it actually be necessary to do a random
 access into the overflow table.

Thats appropriate sometimes, not others, but I'll investigate this
further so that its possible to take advantage of non-zero fillfactors
when they exist.

There's a number of distinct use-cases here:

If you have a very small, heavily updated table it makes a lot of sense
to use lower fillfactors as well.

If you have a larger table, using fillfactor 50% immediately doubles the
size of the table. If the updates are uneven, as they mostly are because
of the Benfold distribution/Pareto principle, then it has been found
that leaving space on block doesn't help the heavily updated portions of
a table, whereas it hinders the lightly updated portions of a table.

TPC-C and TPC-B both have uniformly distributed UPDATEs, so its easy to
use the fillfactor to great advantage there.

 More generally, do we need an overflow table at all, rather than having
 these overflow tuples living in the same file as the root tuples?  As
 long as there's a bit available to mark a tuple as being this special
 not-separately-indexed type, you don't need a special location to know
 what it is.  This might break down in the presence of seqscans though.

HOT currently attempts to place a subsequent UPDATE on the same page of
the overflow relation, but this doesn't happen (yet) for placing
multiple versions on same page. IMHO it could, but will think about it.

  This allows the length of a typical tuple chain to be extremely short in
  practice. For a single connection issuing a stream of UPDATEs the chain
  length will no more than 1 at any time.
 
 Only if there are no other transactions being held open, which makes
 this claim a lot weaker.

True, but Nikhil has run tests that clearly show HOT outperforming
current situation in the case of long running transactions. The need to
optimise HeapTupleSatisfiesVacuum() and avoid long chains does still
remain a difficulty for both HOT and the current situation.

  HOT can only work in cases where a tuple does not modify one of the
  columns defined in an index on the table, and when we do not alter the
  row length of the tuple.
 
 Seems like altering the row length isn't the issue, it's just is
 there room on the page for the new version.  Again, a generous
 fillfactor would give you more flexibility.

The copy-back operation can only work if the tuple fits in the same
space as the root tuple. If it doesn't you end up with a tuple
permanently in the overflow relation. That might not worry us, I guess.

Also, my understanding was that an overwrite operation could not vary
the length of a tuple (at least according to code comments).

  [We'll be able to do that more efficiently when
  we have plan invalidation]
 
 Uh, what's that got to do with it?

Currently the HOT code dynamically tests to see if the index columns
have been touched. If we had plan invalidation that would be able to be
assessed more easily at planning time, in cases where there is no BEFORE
trigger.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

HOT can only work in cases where a tuple does not modify one of the
columns defined in an index on the table, and when we do not alter the
row length of the tuple.

Seems like altering the row length isn't the issue, it's just is
there room on the page for the new version.  Again, a generous
fillfactor would give you more flexibility.


The copy-back operation can only work if the tuple fits in the same
space as the root tuple. If it doesn't you end up with a tuple
permanently in the overflow relation. That might not worry us, I guess.

Also, my understanding was that an overwrite operation could not vary
the length of a tuple (at least according to code comments).


You can't If someone else has the page pinned,




[We'll be able to do that more efficiently when
we have plan invalidation]

Uh, what's that got to do with it?


Currently the HOT code dynamically tests to see if the index columns
have been touched. If we had plan invalidation that would be able to be
assessed more easily at planning time, in cases where there is no BEFORE
trigger.




--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Heikki Linnakangas

Oops, pressed send too early. Ignore the one-line reply I just sent...

Simon Riggs wrote:

On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

HOT can only work in cases where a tuple does not modify one of the
columns defined in an index on the table, and when we do not alter the
row length of the tuple.

Seems like altering the row length isn't the issue, it's just is
there room on the page for the new version.  Again, a generous
fillfactor would give you more flexibility.


The copy-back operation can only work if the tuple fits in the same
space as the root tuple. If it doesn't you end up with a tuple
permanently in the overflow relation. That might not worry us, I guess.


You can't move tuples around in a page without holding a Vacuum lock on 
the page. Some other backend might have the page pinned and have a 
pointer to a tuple on the page that would get bogus if the tuple is 
moved. Maybe you could try to get a vacuum lock when doing the update 
and prereserve space for the new version if you can get one.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 Seems like altering the row length isn't the issue, it's just is
 there room on the page for the new version.  Again, a generous
 fillfactor would give you more flexibility.

 The copy-back operation can only work if the tuple fits in the same
 space as the root tuple. If it doesn't you end up with a tuple
 permanently in the overflow relation. That might not worry us, I guess.

I think he's suggesting that you can put the new version in the available
space rather than use the space from the existing tuple. You can keep the same
line pointer so index entries still refer to the correct tuple.

The only problem I see is that if you determine that there's space available
when you do the update that space may have disappeared by the table you come
along to do the move-back. 

Perhaps you can do something clever with reserving the space at that time for
the later move-back but I fear that'll complicate vacuum and open up risks if
the system crashes in that state.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread NikhilS
Hi, 
  This allows the length of a typical tuple chain to be extremely short in  practice. For a single connection issuing a stream of UPDATEs the chain  length will no more than 1 at any time.
 Only if there are no other transactions being held open, which makes this claim a lot weaker.True, but Nikhil has run tests that clearly show HOT outperformingcurrent situation in the case of long running transactions. The need to
optimise HeapTupleSatisfiesVacuum() and avoid long chains does stillremain a difficulty for both HOT and the current situation.Yes, I carried out some pgbench runs comparing our current HOT update
patch with PG82BETA2 sources for the long running transaction case. For
an apples to apples comparison we got roughly 170% improvement with the
HOT update patch over BETA2.

In case of BETA2, since all versions are in the main heap, we end up
doing multiple index scans for them. In case of HOT updates, we have a
single index entry with the chains getting traversed from the overflow
relation. So as Simon has mentioned the need to avoid long chains remains
a difficulty for both the situations.

Regards,
Nikhils


-- EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Simon Riggs
On Fri, 2006-11-10 at 12:32 +0100, Zeugswetter Andreas ADI SD wrote:
   As more UPDATEs take place these tuple chains would grow, making 
   locating the latest tuple take progressively longer.
  
  More generally, do we need an overflow table at all, rather 
  than having these overflow tuples living in the same file as 
  the root tuples?  As long as there's a bit available to mark 
  a tuple as being this special not-separately-indexed type, 
  you don't need a special location to know what it is.
 
 Yes, I have that same impression.

I think this might work, I'll think about it.

If you can come up with a test case where we would need this
optimisation then I'm sure that can be prototyped.

In many cases though an overflow relation is desirable, so ISTM that we
might want to be able to do both: have an updated tuple with not-indexed
bit set in the heap if on same page, else go to overflow relation.

 1. It doubles the IO (original page + hot page), if the new row would 
   have fit into the original page.

Yes, but thats a big if, and once it is full that optimisation goes
away. Some thoughts:

First, it presumes that the second block is not in cache (see later).

Second, my observation is that this can happen for some part of the
time, but under heavy updates this window of opportunity is not the
common case and so this optimisation would not make much difference.
However, in some cases, it would be appropriate, so I'll investigate.

HOT optimises the case where a tuple in the overflow relation is
UPDATEd, so it can place the subsequent tuple version on the same page.
So if you perform 100 UPDATEs on a single tuple, the first will write to
the overflow relation in a new block, then the further 99 will attempt
to write to the same block, if they can. So in many cases we would do
only 101 block accesses and no real I/O.

 2. locking should be easier if only the original heap page is involved.

Yes, but multi-page update already happens now, so HOT is not different
on that point.

 3. It makes the overflow pages really hot because all concurrent updates
 compete
   for the few overflow pages.

Which ensures they are in shared_buffers, rather than causing I/O. The
way FSM works, it will cause concurrent updaters to spread out their
writes to many blocks. So in the case of a single high frequency updater
all of the updates go into the same block of the overflow relation, so
the optimisation you referred to in (1) does take effect strongly in
that case, yet without causing contention with other updaters. The FSM
doesn't change with HOT, but the effects of having inserted additional
tuples into the main heap are much harder to undo afterwards.

The overflow relation varies in size according to the number of updates,
not the actual number of tuples, as does the main heap, so VACUUMing
will focus on the hotspots and be more efficient, especially since no
indexes need be scanned. [Sure we can use heapblock-need-vacuum bitmaps,
but there will still be a mix of updated/not-updated tuples in there, so
VACUUM would still be less efficient than with HOT]. So VACUUM can
operate more frequently on the overflow relation and keep the size
reasonable for more of the time, avoiding I/O.

Contention is and will remain a HOT topic ;-)
I understand your concerns and we should continue to monitor this on the
various performance tests that will be run.

 4. although at first it might seem so I see no advantage for vacuum with
 overflow 

No need to VACUUM the indexes, which is the most expensive part. The
more indexes you have, the more VACUUM costs, not so with HOT.

 5. the size reduction of heap is imho moot because you trade it for a
 growing overflow
   (size reduction only comes from reusing dead tuples and not
 adding index tuples -- SITC)

HOT doesn't claim to reduce the size of the heap. In the presence of a
long running transaction, SizeOfHOT(heap + overflow) =
SizeOfCurrent(Heap).

VACUUM is still required in both cases to manage total heap size. If we
have solely UPDATEs and no deletes, then only the overflow relation need
be VACUUMed.

 Could you try to explain the reasoning behind separate overflow storage
 ?

I think the answers above cover the main points, which seem to make the
case clearly enough from a design rationale perspective, even without
the performance test results to confirm them.

 What has been stated so far was not really conclusive to me in this
 regard.

Personally, I understand. I argued against them for about a month after
I first heard of the idea, but they make sense for me now. HOT has
evolved considerably from the various ideas of each of the original idea
team (Jonah, Bruce, Jan, myself) and will continue to do so as better
ideas replace poor ones, based on performance tests. All of the ideas
within it need to be strongly challenged to ensure we arrive at the best
solution.

 e.g. a different header seems no easier in overflow than in heap 

True. The idea there is that we 

Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Pavan Deolasee
On 11/10/06, Simon Riggs [EMAIL PROTECTED] wrote:
On Fri, 2006-11-10 at 12:32 +0100, Zeugswetter Andreas ADI SD wrote: e.g. a different header seems no easier in overflow than in heap
True. The idea there is that we can turn frequent update on/off fairlyeasily for normal tables since there are no tuple format changes in themain heap. It also allows HOT to avoid wasting space when a table is
heavily updated in certain places only.I general though, it would make implementation a bit simpler when tuples withdifferent header are isolated in a different relation. Regards,
Pavan


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Nikhil S

Hi,
 True, but Nikhil has run tests that clearly show HOT outperforming
 current situation in the case of long running transactions. The need to
 optimise HeapTupleSatisfiesVacuum() and avoid long chains does still
 remain a difficulty for both HOT and the current situation.


Yes, I carried out some pgbench runs comparing our current HOT update
patch with PG82BETA2 sources for the long running transaction case. For
an apples to apples comparison we got roughly 170% improvement with the
HOT update patch over BETA2.

In case of BETA2, since all versions are in the main heap, we end up
doing multiple index scans for them. In case of HOT updates, we have a
single index entry with the chains getting traversed from the overflow
relation. So as Simon has mentioned the need to avoid long chains remains
a difficulty for both the situations.

Regards,
Nikhils


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Zeugswetter Andreas ADI SD

  2. locking should be easier if only the original heap page is
involved.
 
 Yes, but multi-page update already happens now, so HOT is not 
 different on that point.

I was thinking about the case when you pull back a tuple, which seems
to be more
difficult than what we have now.

Andreas

PS: I think it is great that you are doing all this work and explaining
it for us. Thanks.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Zeugswetter Andreas ADI SD

  True, but Nikhil has run tests that clearly show HOT outperforming 
  current situation in the case of long running transactions. The need

  to optimise HeapTupleSatisfiesVacuum() and avoid long chains does 
  still remain a difficulty for both HOT and the current situation.
 
 
 Yes, I carried out some pgbench runs comparing our current 
 HOT update patch with PG82BETA2 sources for the long running 
 transaction case. For an apples to apples comparison we got

Vaccuums every 5 minutes, or no vaccuums ?
 
 roughly 170% improvement with the HOT update patch over BETA2.

Wow, must be smaller indexes and generally less index maintenance. 
What this also states imho, is that following tuple chains
is not so expensive as maintaining indexes (at least in a heavy update 
scenario like pgbench).

Maybe we should try a version, where the only difference to now is,
that when the index keys stay the same the indexes are not updated, and
the tuple
chain is followed instead when selecting with index. (Maybe like the
current alive flag the index pointer can even be refreshed to the oldest
visible
tuple by readers)

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread NikhilS
Hi,
On 11/10/06, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote:
  True, but Nikhil has run tests that clearly show HOT outperforming  current situation in the case of long running transactions. The need
  to optimise HeapTupleSatisfiesVacuum() and avoid long chains does  still remain a difficulty for both HOT and the current situation. Yes, I carried out some pgbench runs comparing our current
 HOT update patch with PG82BETA2 sources for the long running transaction case. For an apples to apples comparison we gotVaccuums every 5 minutes, or no vaccuums ?


We tried with both. Vacuumseems to dolittle to help in a long running transaction case. Generally in most of the pgbench runs that we carried out, autovacuum did not seem to be of much help even to PG82.

Regards,
Nikhils-- EnterpriseDB http://www.enterprisedb.com 


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-10 Thread Simon Riggs
On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote:

  I'm not sure this really solves that problem because there 
  are still DELETEs to consider but it does remove one factor 
  that exacerbates it unnecessarily.
 
 Yea, so you still need to vaccum the large table regularly.

HOT covers the use-case of heavy updating, which in many common cases
occurs on tables with few inserts/deletes. HOT would significantly
reduce the need to vacuum since deletes and wraparound issues would be
the only remaining reasons to do this.

[I have some ideas for how to optimize tables with heavy INSERT/DELETE
activity, but that case is much less prevalent than heavy UPDATEs.]

  I think the vision is that the overflow table would never be 
  very large because it can be vacuumed very aggressively. It 
  has only tuples that are busy and will need vacuuming as soon 
  as a transaction ends. Unlike the main table which is mostly 
  tuples that don't need vacuuming. 
 
 Ok, but you have to provide an extra vacuum that does only that then
 (and it randomly touches heap pages, and only does partial work there).

Sure, HOT needs a specially optimised VACUUM.

  So a heap that's double in size necessary takes twice as 
  long as necessary to scan. The fact that the overflow tables 
  are taking up space isn't interesting if they don't have to 
  be scanned.
 
 The overflow does have to be read for each seq scan. And it was stated
 that it would
 be accessed with random access (follow tuple chain).
 But maybe we can read the overflow same as if it where an additional
 segment file ?

Not without taking a write-avoiding lock on the table, unfortunately.

  Hitting the overflow tables should be quite rare, it only 
  comes into play when looking at concurrently updated tuples. 
  It certainly happens but most tuples in the table will be 
  committed and not being concurrently updated by anyone else.
 
 The first update moves the row to overflow, only the 2nd next might be
 able to pull it back.
 So on average you would have at least 66% of all updated rows after last
 vacuum in the overflow.
 
 The problem with needing very frequent vacuums is, that you might not be
 able to do any work because of long transactions.

HOT doesn't need more frequent VACUUMs, it is just more efficient and so
can allow them, when needed to avoid I/O. Space usage in the overflow
relation is at its worst in the case of an enormous table with low
volume random updates, but note that it is *never* worse than current
space usage. In the best case, which is actually fairly common in
practice: a small number of rows of a large table are being updated by a
steady stream of concurrent updates, we find the overflow relation needs
only a few 100 tuples, so regular vacuuming will be both easy and
effective.

As an aside, note that HOT works best in real-world situations, not
benchmarks such as TPC where the I/Os are deliberately randomised to
test the scalability of the RDBMS. But even then, HOT works better.

The long-running transaction issue remains unsolved in this proposal,
but I have some ideas for later.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-09 Thread Simon Riggs
On Thu, 2006-11-09 at 18:49 +0100, Martijn van Oosterhout wrote:
 Nice idea, just one question:

 It seems to me that bitmap index scans will get these same
 characteristics also, right? The bitmap scan will have to follow the
 chain of any possibly matching tuple in any of the blocks that are in
 the bitmap, right?

Yes, they would identify the root tuples. The whole chain has matching
index values, by definition, so the re-evaluation for lossy bitmaps will
work just the same before the actual tuple is retrieved by walking the
chain (if required).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-09 Thread Simon Riggs
On Thu, 2006-11-09 at 13:21 -0800, Josh Berkus wrote:
 Simon,
 
  If we perform an update that meets the HOT criteria then we put the
  new version into the overflow relation; we describe this as a HOT
  UPDATE. If we perform an update that does not meet the criteria, then we
  carry on with the existing/old MVCC behaviour; we describe this as a
  non-HOT UPDATE.
 
 Making the essential performance analysis question, Am I HOT or Not?  

Very good. ;-)

Well, we had Overflow Update CHaining as an alternative name... :-)

The naming sounds silly, but we had a few alternate designs, so we
needed to be able to tell them apart sensibly. We've had TVR, SITC, UIP
and now HOT. Software research...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend