Re: [HACKERS] HOT WIP Patch - version 1

2007-02-19 Thread Simon Riggs
On Fri, 2007-02-16 at 09:36 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
 Linnakangas:
 
  We already log tuple removals by normal vacuums. We can't use that wal 
  entry as it is: if a dead tuple is in the middle of an update chain, it 
  needs to be unlinked from the chain. But I don't see any particular 
  problem with that, it just needs to be wal logged like every other data 
  changing operation.
  
  Do we actually ever want to remove dead tuples from the middle of the 
  chain? If a tuple in the middle of the chain is dead, surely every tuple 
  before it in the chain is dead as well, and we want to remove them as 
  well. I'm thinking, removing tuples from the middle of the chain can be 
  problematic, because we'd need to fiddle with the xmin/xmax of the other 
  tuples to make them match. Or change the tuple-following logic to not do 
  the xmin=xmax check, but it's a nice robustness feature.
 
 What kind of robustness does it provide ? In other words - what failure
 scenario does this guard against ?
 
 I can't see the case where the xmin=xmax check can not succeed, at least
 not for same page tuples.

The xmin == xmax case was required to support a complex VACUUM FULL
failure case discovered some time ago. I'll post the details.

With HOT, ISTM that testing this becomes even more important in some
form because of the extra complexity we're putting into that part of the
code. We need some way of detecting a corruption introduced by some
weird use case we haven't thought of.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-19 Thread Simon Riggs
On Thu, 2007-02-15 at 10:49 +, Heikki Linnakangas wrote:

 Do we actually ever want to remove dead tuples from the middle of the 
 chain? If a tuple in the middle of the chain is dead, surely every tuple 
 before it in the chain is dead as well, and we want to remove them as 
 well. I'm thinking, removing tuples from the middle of the chain can be 
 problematic, because we'd need to fiddle with the xmin/xmax of the other 
 tuples to make them match. Or change the tuple-following logic to not do 
 the xmin=xmax check, but it's a nice robustness feature.

I think the phrase middle of the chain is being used in two ways here.

If a tuple in the chain is dead, then all prior tuples are dead also.
That gives us a live portion of the chain and a dead portion of the
chain.

We will only ever need to follow the chain along the live portion, so
breaking the live portion of the chain is not allowable. Breaking the
chain in the dead portion *is* allowable and that is what is being
proposed here. We break the chain rather than simply remove all of it
because there may still be index pointers to some of the dead rows. 

Pavan's code does this now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-16 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
Linnakangas:

 We already log tuple removals by normal vacuums. We can't use that wal 
 entry as it is: if a dead tuple is in the middle of an update chain, it 
 needs to be unlinked from the chain. But I don't see any particular 
 problem with that, it just needs to be wal logged like every other data 
 changing operation.
 
 Do we actually ever want to remove dead tuples from the middle of the 
 chain? If a tuple in the middle of the chain is dead, surely every tuple 
 before it in the chain is dead as well, and we want to remove them as 
 well. I'm thinking, removing tuples from the middle of the chain can be 
 problematic, because we'd need to fiddle with the xmin/xmax of the other 
 tuples to make them match. Or change the tuple-following logic to not do 
 the xmin=xmax check, but it's a nice robustness feature.

What kind of robustness does it provide ? In other words - what failure
scenario does this guard against ?

I can't see the case where the xmin=xmax check can not succeed, at least
not for same page tuples.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-15 Thread Zeugswetter Andreas ADI SD
 
Bruce Momjian wrote:
 Just to summarize:
 
   o  Every tuple gets a heap ctid
   o  Only the root tuple gets an index entry

   o  We can easily remove dead tuples that aren't the root because
  by definition, nothing points to them, including backends and
  indexes

I am still wondering about the easily here. Basically this
needs some kind of wal entry to be crash safe. 

Else some later tx might reuse the slot:
- some update on page produces page image in wal
- slot removed
- slot reused and comitted
- page not written
- crash
- wal fullpage restores the page to the version before slot
removed
(- would need a wal replay for slot removed from hot chain here)
- wal restores slot reuse, but the slot is now part of a wrong
hot chain
  and the chain is broken (unless we have the above step)

Do we have this wal entry ?

 The problem is that a dead root tuple has to stay around 
 because while no backends can see it, the index does.  We 
 could move a live tuple into root ctid slot, but if we do 
 that, the live tuple changes its ctid while it is visible.
 
 Could we insert index tuples for the live tuple and then 
 remove the root tuple, perhaps later?  So basically we break 
 the chain at that time. 
 The problem there is that we basically have nothing better 
 than what we have now --- we are just delaying the index 
 insert, and I don't see what that buys us.

yes, not really promising.

 Could a _new_ tuple take over the root tuple slot?  It is 
 new, so it doesn't have a ctid yet to change.  I think that 
 means the index walking
 could go forward or backward, but on the same page.   To illustrate,
 with ctid slot numbers:
   
   [1] root INSERT
   [2]
   [3]
   
   [1] root INSERT
   [2] UPDATE
   [3]
   
   [1] root INSERT (dead)
   [2] UPDATE 1
   [3]
   
   [1] root UPDATE 2
   [2] UPDATE 1
   [3]

Imho no, because that would need a back pointer in [1] to [2] so that
readers arriving at [1] (e.g. through index) can see [2] until [1] is
visible.
I think we don't want backpointers. (rather go the tuple swapping route)

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-15 Thread Heikki Linnakangas

Zeugswetter Andreas ADI SD wrote:

I am still wondering about the easily here. Basically this
needs some kind of wal entry to be crash safe. 


Else some later tx might reuse the slot:
- some update on page produces page image in wal
- slot removed
- slot reused and comitted
- page not written
- crash
- wal fullpage restores the page to the version before slot
removed
(- would need a wal replay for slot removed from hot chain here)
- wal restores slot reuse, but the slot is now part of a wrong
hot chain
  and the chain is broken (unless we have the above step)

Do we have this wal entry ?


We already log tuple removals by normal vacuums. We can't use that wal 
entry as it is: if a dead tuple is in the middle of an update chain, it 
needs to be unlinked from the chain. But I don't see any particular 
problem with that, it just needs to be wal logged like every other data 
changing operation.


Do we actually ever want to remove dead tuples from the middle of the 
chain? If a tuple in the middle of the chain is dead, surely every tuple 
before it in the chain is dead as well, and we want to remove them as 
well. I'm thinking, removing tuples from the middle of the chain can be 
problematic, because we'd need to fiddle with the xmin/xmax of the other 
tuples to make them match. Or change the tuple-following logic to not do 
the xmin=xmax check, but it's a nice robustness feature.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-15 Thread Pavan Deolasee

On 2/15/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:



Do we actually ever want to remove dead tuples from the middle of the
chain? If a tuple in the middle of the chain is dead, surely every tuple
before it in the chain is dead as well, and we want to remove them as
well. I'm thinking, removing tuples from the middle of the chain can be
problematic, because we'd need to fiddle with the xmin/xmax of the other
tuples to make them match. Or change the tuple-following logic to not do
the xmin=xmax check, but it's a nice robustness feature.



I am precisely working on this right now. In the next patch version that I
intend to send shortly, I am thinking of removing the dead tuples in the
middle of the chain. We don't have agreement on how to deal with the
root tuple, but we can safely remove the intermediate dead tuples and
vacuum them. Also when all the tuples in the chain are dead because the
last tuple is either deleted or COLD updated, the entire chain along with
the root tuple and the index entry can be vacuumed.

The operation must be WAL logged and you caught the xmin/xmax
problem very rightly. One option is to change the xmax of root tuple
to the xmin of the first live/recently-dead tuple, if we remove a set of
intermediate dead tuples. This xmin of the first live/recently-dead tuple
is also the xmax of the last dead tuple we removed and hence must
be older than the oldtestXmin. So assigning that to the root tuple
should not break any visibility rules for the root tuple (it would still
be dead).

Do we see any problem with this ?

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Heikki Linnakangas

Pavan Deolasee wrote:
- We need to find a way to handle DEAD root tuples, either convert them 
into

stubs or overwrite them with a new version. We can also perform pointer
swinging from the index. Again there are concerns about crash-safety and
concurrent index-scans working properly. We don't have a community
consensus on any of the suggestions in this regard. But hopefully we
would converge on some design soon.


This seems to be the most fundamental problem we have at the moment. If 
we stick to the basic rule we have now that a live tuple's ctid doesn't 
change, the only way to get rid of a dead tuple at the root of the 
update chain is by changing the index pointer. The backward-pointers 
Hannu suggested or the scan the whole page to find the previous tuple 
would allow reuse of those dead tuples for new tuples in the chain, but 
even those methods wouldn't completely eliminate the problem.


We could say that in some scenarios we just leave behind some dead 
tuples/stubs that can never be reclaimed. What do you guys think, if we 
can bring it down to just an extra line pointer, would that be 
acceptable? We could also do that for now and implement the 
pointer-swinging later if it turns out to be a problem in practice.


What's the verdict on relaxing the live tuple's ctid doesn't change 
rule? If we did allow that within a page, what would we need to change? 
  Inside the backend we'd have to make sure that whenever a ctid is 
used, the page is kept pinned. How likely is it that it would brake any 
external projects, and how difficult would it be to detect the broken 
usage pattern? It would mean that the ctid system column could change 
within a transaction, unless we change it so that it returns the ctid of 
the root tuple + xmin + cmin, but it'd be a user-visible change anyhow.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 What's the verdict on relaxing the live tuple's ctid doesn't change 
 rule?

I think that's unacceptable; it is known that that will break the ODBC
and JDBC drivers, as well as any other programs that make use of the
ctid for re-finding a tuple they read earlier in the same transaction.
We have not only never deprecated client-side use of ctid for this, but
actively encouraged it, for instance by going out of our way to support
fast access for queries WHERE ctid = 'constant'.

What's more, your proposal would break plain old UPDATE and DELETE,
as well as SELECT FOR UPDATE, none of which promise to hold a pin
continuously on every page containing a tuple they might decide to
revisit (by ctid) later.  Are you prepared to disallow hash join and
sort/merge join in all such queries?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
What's the verdict on relaxing the live tuple's ctid doesn't change 
rule?


I think that's unacceptable; it is known that that will break the ODBC
and JDBC drivers, as well as any other programs that make use of the
ctid for re-finding a tuple they read earlier in the same transaction.


AFAIK the JDBC driver doesn't use ctid. But ODBC and other programs do.


We have not only never deprecated client-side use of ctid for this, but
actively encouraged it, for instance by going out of our way to support
fast access for queries WHERE ctid = 'constant'.


The idea I had was to change what the ctid system column returns to 
root ctid + xmin + cmin. As long as programs treat the ctid as an opaque 
string, it should work. Tid scan would use that to locate the original 
tuple.



What's more, your proposal would break plain old UPDATE and DELETE,
as well as SELECT FOR UPDATE, none of which promise to hold a pin
continuously on every page containing a tuple they might decide to
revisit (by ctid) later.  Are you prepared to disallow hash join and
sort/merge join in all such queries?


No, of course not. We'd have to do the same thing here; use root tid + 
xmin + cmin instead of just ctid.


But now that I think of it, how do we get the root tid of a tuple? I 
suppose we'd be back to having backpointers or scanning the whole 
page... I guess pointer-swinging it is, then.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Pavan Deolasee

On 2/14/07, Tom Lane [EMAIL PROTECTED] wrote:


Heikki Linnakangas [EMAIL PROTECTED] writes:
 What's the verdict on relaxing the live tuple's ctid doesn't change
 rule?

I think that's unacceptable; it is known that that will break the ODBC
and JDBC drivers, as well as any other programs that make use of the
ctid for re-finding a tuple they read earlier in the same transaction.
We have not only never deprecated client-side use of ctid for this, but
actively encouraged it, for instance by going out of our way to support
fast access for queries WHERE ctid = 'constant'.

What's more, your proposal would break plain old UPDATE and DELETE,
as well as SELECT FOR UPDATE, none of which promise to hold a pin
continuously on every page containing a tuple they might decide to
revisit (by ctid) later.  Are you prepared to disallow hash join and
sort/merge join in all such queries?




Not that I am suggesting we do this, but I believe we had some solution
to this problem in the earlier version of HOT. The live tuple when
copied-back
to the root tuple, the tuple is marked with a HEAP_COPIED_BACK flag.

HeapTupleSatisfiesUpdate() checks for this flag and if set returns a new
return code, HeapTupleCopiedBack. heap_update() returns the same
to ExecUpdate along with the ctid of the root tuple. The UPDATE/DELETE
operation then retried on the root tuple, very similar to read-committed
update/delete. The xmax of the copied-back tuple is set so that its
not vacuumed away until all the current transactions are completed.

Though I have tested this patch several times and it seems to work fine,
I probably don''t have insight into the code as much others on this list
has. So if someone wants to take a look and see if it would work fine,
I would be more than happy to post the latest HOT patch (older design).

Thanks,
Pavan



--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Zeugswetter Andreas ADI SD

 What's the verdict on relaxing the live tuple's ctid doesn't 
 change rule? If we did allow that within a page, what would 
 we need to change?

I already said this, but why would this need to be visible from the
outside ?

A few assumptions:
no back pointers
indexes only point at slots marked as roots (and non hot tuples)

During vacuum, you swap the tuples and keep a stub at the slot that the
user's ctid might be pointing at. You mark the stub to detect this
situation.
When a select/update by ctid comes along it needs to do one step to the
root
and use that tuple instead.

It needs a second vacuum (or a per page vacuum during update) to remove
the 
extra stub when it is dead and not recently dead. 

I fail to see the hole.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Zeugswetter Andreas ADI SD

 But now that I think of it, how do we get the root tid of a 
 tuple? I suppose we'd be back to having backpointers or 
 scanning the whole page... I guess pointer-swinging it is, then.

During vacuum you see a root [stub] not recently dead. You follow 
the chain to detect if you find a live tuple that can replace
the root. You replace the root. You replace the original with a stub
that points at the root and mark it recently dead (and HEAP_COPIED_BACK
aka Pavan). ... (see prev post)
No need for anyone but vacuum to find a root.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Bruce Momjian

Just to summarize:

o  Every tuple gets a heap ctid
o  Only the root tuple gets an index entry
o  We can easily remove dead tuples that aren't the root because
   by definition, nothing points to them, including backends and
   indexes

The problem is that a dead root tuple has to stay around because while
no backends can see it, the index does.  We could move a live tuple into
root ctid slot, but if we do that, the live tuple changes its ctid while
it is visible.

Could we insert index tuples for the live tuple and then remove the root
tuple, perhaps later?  So basically we break the chain at that time. 
The problem there is that we basically have nothing better than what we
have now --- we are just delaying the index insert, and I don't see what
that buys us.

Could a _new_ tuple take over the root tuple slot?  It is new, so it
doesn't have a ctid yet to change.  I think that means the index walking
could go forward or backward, but on the same page.   To illustrate,
with ctid slot numbers:

[1] root INSERT
[2]
[3]

[1] root INSERT
[2] UPDATE
[3]

[1] root INSERT (dead)
[2] UPDATE 1
[3]

[1] root UPDATE 2
[2] UPDATE 1
[3]

---

Heikki Linnakangas wrote:
 Pavan Deolasee wrote:
  - We need to find a way to handle DEAD root tuples, either convert them 
  into
  stubs or overwrite them with a new version. We can also perform pointer
  swinging from the index. Again there are concerns about crash-safety and
  concurrent index-scans working properly. We don't have a community
  consensus on any of the suggestions in this regard. But hopefully we
  would converge on some design soon.
 
 This seems to be the most fundamental problem we have at the moment. If 
 we stick to the basic rule we have now that a live tuple's ctid doesn't 
 change, the only way to get rid of a dead tuple at the root of the 
 update chain is by changing the index pointer. The backward-pointers 
 Hannu suggested or the scan the whole page to find the previous tuple 
 would allow reuse of those dead tuples for new tuples in the chain, but 
 even those methods wouldn't completely eliminate the problem.
 
 We could say that in some scenarios we just leave behind some dead 
 tuples/stubs that can never be reclaimed. What do you guys think, if we 
 can bring it down to just an extra line pointer, would that be 
 acceptable? We could also do that for now and implement the 
 pointer-swinging later if it turns out to be a problem in practice.
 
 What's the verdict on relaxing the live tuple's ctid doesn't change 
 rule? If we did allow that within a page, what would we need to change? 
Inside the backend we'd have to make sure that whenever a ctid is 
 used, the page is kept pinned. How likely is it that it would brake any 
 external projects, and how difficult would it be to detect the broken 
 usage pattern? It would mean that the ctid system column could change 
 within a transaction, unless we change it so that it returns the ctid of 
 the root tuple + xmin + cmin, but it'd be a user-visible change anyhow.
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread mark
On Wed, Feb 14, 2007 at 01:56:03PM -0500, Bruce Momjian wrote:
 Could we insert index tuples for the live tuple and then remove the root
 tuple, perhaps later?  So basically we break the chain at that time. 
 The problem there is that we basically have nothing better than what we
 have now --- we are just delaying the index insert, and I don't see what
 that buys us.

At some point - inserting into the block would not be possible, as
there is no free space. Would that be a good time to do the index
insert? Then, a later vacuum would eventually prune out the whole old
chain. As long as vacuuming the intermediate entries in the chain
keeps the block with free space, there is no need to remove the root
tuple. If space ever runs out (vacuum not running frequently enough -
many updates performed in the same interval) - fall back to the
mechanism that is being used today.

I see it buying increased performance for rows that are frequently
updated.  If it can delay modifying the indices to only once every 10
or more updates, it seems to me that the improvement should be
significant. Perhaps PostgreSQL could be used for page hit counters
again... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] HOT WIP Patch - version 1

2007-02-14 Thread Heikki Linnakangas

Zeugswetter Andreas ADI SD wrote:

A few assumptions:
no back pointers
indexes only point at slots marked as roots (and non hot tuples)

During vacuum, you swap the tuples and keep a stub at the slot that the
user's ctid might be pointing at. You mark the stub to detect this
situation.
When a select/update by ctid comes along it needs to do one step to the
root
and use that tuple instead.


As Pavan pointed out, that's more or less what he ended up doing 
originally. You need to mark the stub with the current most recent xid, 
and wait until that's no longer running. Only after that you can remove 
the stub.



It needs a second vacuum (or a per page vacuum during update) to remove
the 
extra stub when it is dead and not recently dead. 


Requiring two vacuums to remove the tuple sounds bad at first, but it's 
actually not so bad since both steps could by done by retail vacuum, or 
even normal scans while.



I fail to see the hole.


The only potential problem I can see is how to make sure that a heap 
scan or a bitmap heap scan doesn't visit the tuple twice. If we make 
sure that the page is scanned in one go while keeping the buffer pinned, 
we're good. We already do that except for system catalogs, so I believe 
we'd have to forbid hot updates on system tables, like we forbid bitmap 
scans.


To me this sounds like the best idea this far.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match