Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread NikhilS
Hi, 
On 11/10/06, Pavan Deolasee <[EMAIL PROTECTED]> wrote:

On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:
 
"Pavan Deolasee" <
[EMAIL PROTECTED]> writes:> On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:> (2) Isn't this full of race conditions? 
> I agree, there  could be race  conditions. But IMO we can handle those.Doubtless you can prevent races by introducing a bunch of additionallocking.  The question was really directed to how much concurrent 
performance is left, once you get done locking it down.I understand your point and I can clearly see a chance to improve upon the currentlocking implementation in the prototype even though we are seeing a good performance 
boost for 50 clients and 50 scaling factor with pgbench runs as mentioned by Nikhil.Regards,Pavan
Yes, we have done a number of runs with and without autovacuum with parameters like 50 clients, 50 scaling factor and 25000 transactions per client. 50 clients should introduce a decent amount of concurrency. The tps values observed with the HOT update patch (850 tps) were approximately 200+% better than PG82 sources (270). 

 
Runs with 25 clients, 25 scaling factor and 25000 transactions produce similar percentage increases with the HOT update patch. 
 
Regards,
Nikhils
 
 
 
-- EnterpriseDB   http://www.enterprisedb.com 


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Zeugswetter Andreas ADI SD

> > 1. It doubles the IO (original page + hot page), if the new row
would 
> > have fit into the original page.
> 
> That's an awfully big IF there. Even if you use a fillfactor 
> of 50% in which case you're paying a 100% performance penalty

I don't see where the 50% come from ? That's only needed if you update
all 
rows on the page. And that in a timeframe, that does not allow reuse of
other 
outdated tuples.
 
> > 4. although at first it might seem so I see no advantage for vacuum 
> > with overflow
> 
> The main problem with vacuum now is that it must scan the 
> entire table (and the entire index) even if only a few 
> records are garbage. If we isolate the garbage in a separate 
> area then vacuum doesn't have to scan unrelated tuples.
> 
> I'm not sure this really solves that problem because there 
> are still DELETEs to consider but it does remove one factor 
> that exacerbates it unnecessarily.

Yea, so you still need to vaccum the large table regularly.

> I think the vision is that the overflow table would never be 
> very large because it can be vacuumed very aggressively. It 
> has only tuples that are busy and will need vacuuming as soon 
> as a transaction ends. Unlike the main table which is mostly 
> tuples that don't need vacuuming. 

Ok, but you have to provide an extra vacuum that does only that then
(and it randomly touches heap pages, and only does partial work there).

> > 5. the size reduction of heap is imho moot because you trade it for
a 
> > growing overflow
> > (size reduction only comes from reusing dead tuples and 
> not adding 
> > index tuples --> SITC)
> 
> I think you're comparing the wrong thing.

I mean unless you do individually vacuum the overflow more frequently

> Size isn't a problem in itself, size is a problem because it causes
extra 
> i/o.

Yes, and I state that at all possible occations :-) OnDisk size is a
problem, really.

> So a heap that's double in size necessary takes twice as 
> long as necessary to scan. The fact that the overflow tables 
> are taking up space isn't interesting if they don't have to 
> be scanned.

The overflow does have to be read for each seq scan. And it was stated
that it would
be accessed with random access (follow tuple chain).
But maybe we can read the overflow same as if it where an additional
segment file ?

> Hitting the overflow tables should be quite rare, it only 
> comes into play when looking at concurrently updated tuples. 
> It certainly happens but most tuples in the table will be 
> committed and not being concurrently updated by anyone else.

The first update moves the row to overflow, only the 2nd next might be
able to pull it back.
So on average you would have at least 66% of all updated rows after last
vacuum in the overflow.

The problem with needing very frequent vacuums is, that you might not be
able to do any work because of long transactions.

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Pavan Deolasee
On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:> On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:> (2) Isn't this full of race conditions?
> I agree, there  could be race  conditions. But IMO we can handle those.Doubtless you can prevent races by introducing a bunch of additionallocking.  The question was really directed to how much concurrent
performance is left, once you get done locking it down.I understand your point and I can clearly see a chance to improve upon the currentlocking implementation in the prototype even though we are seeing a good performance
boost for 50 clients and 50 scaling factor with pgbench runs as mentioned by Nikhil.Regards,Pavan


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Zeugswetter Andreas ADI SD

> > I think the vision is that the overflow table would never be very 
> > large because it can be vacuumed very aggressively. It has only
tuples 
> > that are busy and will need vacuuming as soon as a transaction ends.

> > Unlike the main table which is mostly tuples that don't need 
> > vacuuming.

Except when deleted :-)

> Thats right. vacuum if it gets a chance to work on the 
> overflow relation seems to be doing a decent job in our runs. 
> If autovacuum/vacuum gets to run optimally, the FSM 
> information generated for the overflow relations will be able 
> to serve a lot of new tuple requests  avoiding  undue/large 
> bloat in the overflow relations.

It seems like we would want to create a chain into overflow for deleted
rows also (header + all cols null), so we can vacuum those too only by
looking 
at overflow, at least optionally.

I think the overflow would really need to solve deletes too, or the
bitmap
idea is more generally useful to vacuum.

Generally for clear distinction I think we are talking about two things
here.  
1. reduce index bloat and maintenance work
2. allow vaccuum a cheaper focus on what needs to be done 

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread NikhilS
Hi,I think the vision is that the overflow table would never be very large
because it can be vacuumed very aggressively. It has only tuples that are busyand will need vacuuming as soon as a transaction ends. Unlike the main tablewhich is mostly tuples that don't need vacuuming.
Thats right. vacuum if it gets a chance to work on the overflow relation seems to be doing a decent job in our runs. If autovacuum/vacuum gets to run optimally, the FSM information generated for the overflow relations will be able to serve a lot of new tuple requests  avoiding  undue/large bloat in the overflow relations.
Regards,Nikhils-- EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Gregory Stark
"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes:

> 1. It doubles the IO (original page + hot page), if the new row would 
>   have fit into the original page.

That's an awfully big IF there. Even if you use a fillfactor of 50% in which
case you're paying a 100% performance penalty *all* the time, not just when
dealing with a table that's been bloated by multiple versions you still have
no guarantee the extra versions will fit on the same page.

> 4. although at first it might seem so I see no advantage for vacuum with
> overflow 

The main problem with vacuum now is that it must scan the entire table (and
the entire index) even if only a few records are garbage. If we isolate the
garbage in a separate area then vacuum doesn't have to scan unrelated tuples.

I'm not sure this really solves that problem because there are still DELETEs
to consider but it does remove one factor that exacerbates it unnecessarily.

I think the vision is that the overflow table would never be very large
because it can be vacuumed very aggressively. It has only tuples that are busy
and will need vacuuming as soon as a transaction ends. Unlike the main table
which is mostly tuples that don't need vacuuming. 

> 5. the size reduction of heap is imho moot because you trade it for a
> growing overflow
>   (size reduction only comes from reusing dead tuples and not
> adding index tuples --> SITC)

I think you're comparing the wrong thing. Size isn't a problem in itself, size
is a problem because it causes extra i/o. So a heap that's double in size
necessary takes twice as long as necessary to scan. The fact that the overflow
tables are taking up space isn't interesting if they don't have to be scanned.
Hitting the overflow tables should be quite rare, it only comes into play when
looking at concurrently updated tuples. It certainly happens but most tuples
in the table will be committed and not being concurrently updated by anyone
else.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Pavan Deolasee
On 11/10/06, Heikki Linnakangas <[EMAIL PROTECTED]> wrote:
Tom Lane wrote:> (Actually, the assumption that you can throw an additional back-pointer> into overflow tuple headers is the worst feature of this proposal in> that regard --- it's really not that easy to support multiple header
> formats.)Well, we already have a variable length null bitmap in the header. Itseems quite straightforward to me to add the new field before the nullbitmap. It certainly requires some changes, in particular to places that
access the null bitmap, but it's not an insurmountable effort. Or am Imissing some less obvious consequences?We have added the overflow header (which right now contains a single entry 
i.e.the back pointer) on the very similar lines to optional Oid field in the tuple header.A flag (the last free in the t_infomask) is used to check if there is an additionaloverflow header and if so t_hoff is adjusted appropriately.
So in the current prototype, the overflow header is after the null bitmapand before the Oid, if it exists.Regards,Pavan


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Zeugswetter Andreas ADI SD

> > As more UPDATEs take place these tuple chains would grow, making 
> > locating the latest tuple take progressively longer.
 
> More generally, do we need an overflow table at all, rather 
> than having these overflow tuples living in the same file as 
> the root tuples?  As long as there's a bit available to mark 
> a tuple as being this special not-separately-indexed type, 
> you don't need a special location to know what it is.

Yes, I have that same impression.

1. It doubles the IO (original page + hot page), if the new row would 
have fit into the original page.

2. locking should be easier if only the original heap page is involved.

3. It makes the overflow pages really hot because all concurrent updates
compete
for the few overflow pages.

4. although at first it might seem so I see no advantage for vacuum with
overflow 

5. the size reduction of heap is imho moot because you trade it for a
growing overflow
(size reduction only comes from reusing dead tuples and not
adding index tuples --> SITC)

Could you try to explain the reasoning behind separate overflow storage
?
What has been stated so far was not really conclusive to me in this
regard.
e.g. a different header seems no easier in overflow than in heap 

Andreas

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Heikki Linnakangas

Tom Lane wrote:

"Pavan Deolasee" <[EMAIL PROTECTED]> writes:

Yes. The last bit in the t_infomask is used up to mark presence of overflow
tuple header. But I believe there are few more bits that can be reused.
There are three bits available in the t_ctid field as well (since ip_posid
needs maximum 13 bits).


No, you cannot have those bits --- BLCKSZ is not limited to 8K, and even
if it were, we will not hold still for sticking flag bits into an
unrelated datatype.

You can probably fix this by inventing multiple context-dependent
interpretations of t_infomask bits, but that strikes me as a mighty
ugly and fragile way to proceed.


Yeah, that seems ugly.

I think the best place to grab more bits is the t_natts field. The max 
value for that is MaxTupleAttributeNumber == 1664, which fits in 11 
bits. That leaves 5 bits for other use. We'll have to replace all direct 
access to that field with an accessor macro, but according to grep there 
isn't that many files that need to be changed.



(Actually, the assumption that you can throw an additional back-pointer
into overflow tuple headers is the worst feature of this proposal in
that regard --- it's really not that easy to support multiple header
formats.)


Well, we already have a variable length null bitmap in the header. It 
seems quite straightforward to me to add the new field before the null 
bitmap. It certainly requires some changes, in particular to places that 
access the null bitmap, but it's not an insurmountable effort. Or am I 
missing some less obvious consequences?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-10 Thread Pavan Deolasee
On 11/10/06, Tom Lane <[EMAIL PROTECTED]

> wrote:
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:> Yes. The last bit in the t_infomask is used up to mark presence of overflow
> tuple header. But I believe there are few more bits that can be reused.
> There are three bits available in the t_ctid field as well (since ip_posid> needs maximum 13 bits).No, you cannot have those bits --- BLCKSZ is not limited to 8K, and evenif it were, we will not hold still for sticking flag bits into an
unrelated datatype.BLCKSZ is not limited to 8K, but it is limited  to 32K because of lp_off and lp_lenwhich are 15 bits in size.OffsetNumber is limited to (BLCKSZ / sizeof(ItemIdData)). Since sizeof(ItemIdData) is 4 bytes, MaxOffsetNumber is 8192. So we need only 13 bits to represent the MaxOffsetNumber.
Regards,Pavan




Re: [HACKERS] Frequent Update Project: Design Overview of HOT

2006-11-10 Thread Simon Riggs
On Fri, 2006-11-10 at 09:51 +0200, Hannu Krosing wrote:

> > What are the advantages of HOT over SITC (other than cool name) ?
> 
> still wondering this, is it just the abilty to span multiple pages ?

Multiple page spanning, copy back/VACUUM support, separate overflow
relation to prevent heap growth were the main ones. 

There's ideas in HOT from lots of people and places, SITC was just one
of those. There wasn't a dust-off between different designs, just an
attempt to take the best designs from wherever/whoever they came from,
so HOT isn't one person's design. 

If anybody wants to change the name, we can, to whatever you like.

> > Maybe just make HOT an extended SITC which can span pages.

There is a strong willingness to get the design optimal, which will
require change. 
 
> > > http://www.postgresql.org/about/donate

That's exactly what HOT is all about

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Frequent Update Project: Design Overview of HOT

2006-11-09 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-11-10 kell 09:06, kirjutas Hannu Krosing:
> Ühel kenal päeval, N, 2006-11-09 kell 18:28, kirjutas Tom Lane:
> > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > > As more UPDATEs take place these tuple chains would grow, making
> > > locating the latest tuple take progressively longer.
> > 
> > This is the part that bothers me --- particularly the random-access
> > nature of the search.  I wonder whether you couldn't do something
> > involving an initial table fill-factor of less than 50%, and having
> > the first updated version living on the same heap page as its parent.
> > Only when the active chain length was more than one (which you
> > hypothesize is rare) would it actually be necessary to do a random
> > access into the overflow table.
> > 
> > More generally, do we need an overflow table at all, rather than having
> > these overflow tuples living in the same file as the root tuples?  As
> > long as there's a bit available to mark a tuple as being this special
> > not-separately-indexed type, you don't need a special location to know
> > what it is.  This might break down in the presence of seqscans though.
> 
> And why do you need to mark it as not-separately-indexed at all ?
> 
> We already cope with missing index pointers in VACUUM and I can't see
> any other reason to have it.

Ok, now I see it - we can't VACUUM a tuple, if next versions of it are
accessible by t_ctid chain only. That is vacuum must not free tuples,
which have t_ctid pointing to a tuple that has not-separately-indexed
bit set. This seems to make vacuum quite complicated, as it has to
examine c_tid chains to detect if it can free a tuple, and what's even
worse, it has to examine these chains backwards.

> What are the advantages of HOT over SITC (other than cool name) ?

still wondering this, is it just the abilty to span multiple pages ?

> Maybe just make HOT an extended SITC which can span pages.
> 
> In case of HOT together with reusing index tuples with DELETED bit set
> we don't actually need copyback, but the same index pointer will follow
> the head of live data automatically, maybe lagging only a small number
> of versions.

> > Actually, you omitted to mention the locking aspects of moving tuples
> > around --- exactly how are you going to make that work without breaking
> > concurrent scans?
> > 
> > > This allows the length of a typical tuple chain to be extremely short in
> > > practice. For a single connection issuing a stream of UPDATEs the chain
> > > length will no more than 1 at any time.
> > 
> > Only if there are no other transactions being held open, which makes
> > this claim a lot weaker.
> > 
> > > HOT can only work in cases where a tuple does not modify one of the
> > > columns defined in an index on the table, and when we do not alter the
> > > row length of the tuple.
> > 
> > Seems like "altering the row length" isn't the issue, it's just "is
> > there room on the page for the new version".  Again, a generous
> > fillfactor would give you more flexibility.
> 
> Maybe they hoped to take very light locks when new chaoin head is copied
> iver the old one in the same-length case.
> 
> > http://www.postgresql.org/about/donate
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:
> (2) Isn't this full of race conditions?

> I agree, there  could be race  conditions. But IMO we can handle those.

Doubtless you can prevent races by introducing a bunch of additional
locking.  The question was really directed to how much concurrent
performance is left, once you get done locking it down.

> (3) I thought you already used up the one remaining t_infomask bit.

> Yes. The last bit in the t_infomask is used up to mark presence of overflow
> tuple header. But I believe there are few more bits that can be reused.
> There are three bits available in the t_ctid field as well (since ip_posid
> needs maximum 13 bits).

No, you cannot have those bits --- BLCKSZ is not limited to 8K, and even
if it were, we will not hold still for sticking flag bits into an
unrelated datatype.

You can probably fix this by inventing multiple context-dependent
interpretations of t_infomask bits, but that strikes me as a mighty
ugly and fragile way to proceed.

(Actually, the assumption that you can throw an additional back-pointer
into overflow tuple headers is the worst feature of this proposal in
that regard --- it's really not that easy to support multiple header
formats.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOT

2006-11-09 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-11-09 kell 18:28, kirjutas Tom Lane:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > As more UPDATEs take place these tuple chains would grow, making
> > locating the latest tuple take progressively longer.
> 
> This is the part that bothers me --- particularly the random-access
> nature of the search.  I wonder whether you couldn't do something
> involving an initial table fill-factor of less than 50%, and having
> the first updated version living on the same heap page as its parent.
> Only when the active chain length was more than one (which you
> hypothesize is rare) would it actually be necessary to do a random
> access into the overflow table.
> 
> More generally, do we need an overflow table at all, rather than having
> these overflow tuples living in the same file as the root tuples?  As
> long as there's a bit available to mark a tuple as being this special
> not-separately-indexed type, you don't need a special location to know
> what it is.  This might break down in the presence of seqscans though.

And why do you need to mark it as not-separately-indexed at all ?

We already cope with missing index pointers in VACUUM and I can't see
any other reason to have it.

What are the advantages of HOT over SITC (other than cool name) ?

Maybe just make HOT an extended SITC which can span pages.

In case of HOT together with reusing index tuples with DELETED bit set
we don't actually need copyback, but the same index pointer will follow
the head of live data automatically, maybe lagging only a small number
of versions.

> Actually, you omitted to mention the locking aspects of moving tuples
> around --- exactly how are you going to make that work without breaking
> concurrent scans?
> 
> > This allows the length of a typical tuple chain to be extremely short in
> > practice. For a single connection issuing a stream of UPDATEs the chain
> > length will no more than 1 at any time.
> 
> Only if there are no other transactions being held open, which makes
> this claim a lot weaker.
> 
> > HOT can only work in cases where a tuple does not modify one of the
> > columns defined in an index on the table, and when we do not alter the
> > row length of the tuple.
> 
> Seems like "altering the row length" isn't the issue, it's just "is
> there room on the page for the new version".  Again, a generous
> fillfactor would give you more flexibility.

Maybe they hoped to take very light locks when new chaoin head is copied
iver the old one in the same-length case.

> http://www.postgresql.org/about/donate
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Pavan Deolasee
On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:

"Pavan Deolasee" <[EMAIL PROTECTED]> writes:> On 11/10/06, Josh Berkus <
josh@agliodbs.com> wrote:
>> I believe that's the "unsolved technical issue" in the prototype, unless>> Pavan has solved it in the last two weeks.   Pavan?>>> When an overflow tuple is copied back to the main heap, the overflow tuple
> is> marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,> when a backend tries to lock the overflow version of the tuple, it checks> the flag> and jumps to the main heap if the flag is set.
(1) How does it "jump to the main heap"?  The links go the otherdirection.The overflow tuple has a special header to store the back pointer to the main heap.This increases the tuple header size by 6 bytes, but the overhead is restricted only to the overflow
tuples.(2) Isn't this full of race conditions?I agree, there  could be race  conditions. But IMO we can handle those. When we
follow the tuple chain, we hold a SHARE lock on the main heap buffer. Also, whenthe root tuple is vacuumable and needs to be overwritten, we acquire and keep EXCLUSIVElock on the main heap buffer.

This reduces the race conditions to a great extent.
(3) I thought you already used up the one remaining t_infomask bit.Yes. The last bit in the t_infomask is used up to mark presence of overflow tuple header. But I believe there are few more bits that can be reused. There are three bits available in the t_ctid field as well (since ip_posid needs maximum 13 bits). One bit is used to identify whether a given tid points to the main heap or the overflow heap. This helps when tids are passed around in the code.
Since the back pointer from the overflow tuple always points to the main heap, the same bit can be used to mark copied-back tuples (we are doing it in a slight different way in the current prototype though).

Regards,Pavan




Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
"Pavan Deolasee" <[EMAIL PROTECTED]> writes:
> On 11/10/06, Josh Berkus  wrote:
>> I believe that's the "unsolved technical issue" in the prototype, unless
>> Pavan has solved it in the last two weeks.   Pavan?
>> 
> When an overflow tuple is copied back to the main heap, the overflow tuple
> is
> marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
> when a backend tries to lock the overflow version of the tuple, it checks
> the flag
> and jumps to the main heap if the flag is set.

(1) How does it "jump to the main heap"?  The links go the other
direction.

(2) Isn't this full of race conditions?

(3) I thought you already used up the one remaining t_infomask bit.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Pavan Deolasee
On 11/10/06, Josh Berkus  wrote:
Tom,> Actually, you omitted to mention the locking aspects of moving tuples> around --- exactly how are you going to make that work without breaking> concurrent scans?I believe that's the "unsolved technical issue" in the prototype, unless
Pavan has solved it in the last two weeks.   Pavan?When an overflow tuple is copied back to the main heap, the overflow tuple ismarked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
when a backend tries to lock the overflow version of the tuple, it checks the flagand jumps to the main heap if the flag is set.We use the same technique to update the correct version of a tuple when a
tuple is moved back to the main heap.Regards,Pavan


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Josh Berkus
Tom,

> Actually, you omitted to mention the locking aspects of moving tuples
> around --- exactly how are you going to make that work without breaking
> concurrent scans?

I believe that's the "unsolved technical issue" in the prototype, unless 
Pavan has solved it in the last two weeks.   Pavan?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
"Simon Riggs" <[EMAIL PROTECTED]> writes:
> As more UPDATEs take place these tuple chains would grow, making
> locating the latest tuple take progressively longer.

This is the part that bothers me --- particularly the random-access
nature of the search.  I wonder whether you couldn't do something
involving an initial table fill-factor of less than 50%, and having
the first updated version living on the same heap page as its parent.
Only when the active chain length was more than one (which you
hypothesize is rare) would it actually be necessary to do a random
access into the overflow table.

More generally, do we need an overflow table at all, rather than having
these overflow tuples living in the same file as the root tuples?  As
long as there's a bit available to mark a tuple as being this special
not-separately-indexed type, you don't need a special location to know
what it is.  This might break down in the presence of seqscans though.

Actually, you omitted to mention the locking aspects of moving tuples
around --- exactly how are you going to make that work without breaking
concurrent scans?

> This allows the length of a typical tuple chain to be extremely short in
> practice. For a single connection issuing a stream of UPDATEs the chain
> length will no more than 1 at any time.

Only if there are no other transactions being held open, which makes
this claim a lot weaker.

> HOT can only work in cases where a tuple does not modify one of the
> columns defined in an index on the table, and when we do not alter the
> row length of the tuple.

Seems like "altering the row length" isn't the issue, it's just "is
there room on the page for the new version".  Again, a generous
fillfactor would give you more flexibility.

> [We'll be able to do that more efficiently when
> we have plan invalidation]

Uh, what's that got to do with it?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Josh Berkus
Simon,

> If we perform an update that meets the HOT criteria then we put the
> new version into the overflow relation; we describe this as a HOT
> UPDATE. If we perform an update that does not meet the criteria, then we
> carry on with the existing/old MVCC behaviour; we describe this as a
> non-HOT UPDATE.

Making the essential performance analysis question, "Am I HOT or Not?"  

Sorry, but someone had to say it.  ;-)

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Martijn van Oosterhout
Nice idea, just one question:

On Thu, Nov 09, 2006 at 05:13:16PM +, Simon Riggs wrote:
> Behavioural Characteristics
> ---
> 
> With HOT, it is easily possible that the chain of prior versions spans
> many blocks. The chain always starts with the block of the root tuple
> but possibly includes many overflow relation blocks.
> 
> A SeqScan of a HOT table will turn into a large number of
> random accesses to the overflow relation, which could be considerably
> more expensive than sequential I/O. Clearly very large tables would not
> be able to be HOT enabled without additional cost, so we make HOT a
> table-level option: WITH (freqUPDATE = on)   [discuss...]

It seems to me that bitmap index scans will get these same
characteristics also, right? The bitmap scan will have to follow the
chain of any possibly matching tuple in any of the blocks that are in
the bitmap, right?

Have a ncie day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature