Re: [HACKERS] Proposal: Snapshot cloning

2007-02-02 Thread Jim Nasby

On Jan 29, 2007, at 11:28 PM, Tom Lane wrote:

Jim Nasby [EMAIL PROTECTED] writes:

On Jan 26, 2007, at 4:48 PM, Tom Lane wrote:
I don't actually see that it buys you a darn thing ... you still  
won't

be able to delete dead updated tuples because of the possibility of
the LRT deciding to chase ctid chains up from the tuples it can see.



Well, Simon was talking about a serialized LRT, which ISTM shouldn't
be hunting down ctid chains past the point it serialized at.


How you figure that?  If the LRT wants to update a tuple, it's got to
chase the ctid chain to see whether the head update committed or not.
It's not an error for a serializable transaction to update a tuple  
that

was tentatively updated by a transaction that rolled back.


Nuts. :(


Even if that's not the case, there is also the possibility if a LRT
publishing information about what tables it will hit.


I think we already bought 99% of the possible win there by fixing
vacuum.  Most ordinary transactions aren't going to be able to predict
which other tables the user might try to touch.


Presumably a single-statement transaction could do that in most (if  
not all) cases.


But even if we didn't support automatically detecting what tables a  
transaction was hitting, we could allow the user to specify it and  
then bomb out if the transaction tried to hit anything that wasn't in  
that list. That would allow users who are creating LRTs to limit  
their impact on vacuum. The safe way to perform that check would be  
to check each buffer before accessing it, but I'm unsure how large a  
performance impact that would entail; I don't know how much code we  
run through to pull a tuple out of a page and do something with it  
compared to the cost of checking if that buffer belongs to a relation/ 
file that's in the approved list.


Perhaps a better way would be to allow users to mark vacuum-critical  
tables for restricted access. To access a restricted table the user  
would need to provide a list of restricted tables that a transaction  
is going to hit (or maybe just lump all restricted tables into one  
group), and that transaction would log it's XID somewhere that vacuum  
can look at. If a transaction that hasn't specified it will touch the  
restricted tables tries to do so it errors out. We might want some  
way to flag buffers as belonging to a restricted table (or one of  
it's indexes) so that transactions that aren't hitting restricted  
tables wouldn't have to pay a large performance penalty to figure  
that out. But you'd only have to mark those buffers when they're read  
in from the OS, and presumably a restricted table will be small  
enough that it's buffers should stay put. Logging the XID could prove  
to be a serialization point, but we could possibly avoid that by  
using per-relation locks.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-29 Thread Jim Nasby

On Jan 26, 2007, at 4:48 PM, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:

You got me. My description was too loose, but you also got the rough
picture. We'll save the detail for another day, but we all know its a
bridge we will have to cross one day, soon. I wasn't meaning to raise
this specific discussion now, just to say that publishing  
snapshots for

known LRTs is one way by which we can solve the LRT/VACUUMing issue.


I don't actually see that it buys you a darn thing ... you still won't
be able to delete dead updated tuples because of the possibility of  
the

LRT deciding to chase ctid chains up from the tuples it can see.   You
also seem to be assuming that a transaction can have only one  
snapshot,
which is not something we can enforce in enough cases to make it a  
very

useful restriction.


Well, Simon was talking about a serialized LRT, which ISTM shouldn't  
be hunting down ctid chains past the point it serialized at.


Even if that's not the case, there is also the possibility if a LRT  
publishing information about what tables it will hit. Any tables not  
being touched by a LRT could be vacuumed past the global minxid. It  
would be up to the user to do that in many cases, but that's likely  
to be well worth it if you have LRTs that are only hitting a few  
tables yet you have other tables that really, really need to stay  
vacuumed. Believe me, that is a very common use case in the real  
world (think queue table, or web session table).

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-29 Thread Tom Lane
Jim Nasby [EMAIL PROTECTED] writes:
 On Jan 26, 2007, at 4:48 PM, Tom Lane wrote:
 I don't actually see that it buys you a darn thing ... you still won't
 be able to delete dead updated tuples because of the possibility of  
 the LRT deciding to chase ctid chains up from the tuples it can see.

 Well, Simon was talking about a serialized LRT, which ISTM shouldn't  
 be hunting down ctid chains past the point it serialized at.

How you figure that?  If the LRT wants to update a tuple, it's got to
chase the ctid chain to see whether the head update committed or not.
It's not an error for a serializable transaction to update a tuple that
was tentatively updated by a transaction that rolled back.

 Even if that's not the case, there is also the possibility if a LRT  
 publishing information about what tables it will hit.

I think we already bought 99% of the possible win there by fixing
vacuum.  Most ordinary transactions aren't going to be able to predict
which other tables the user might try to touch.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Thu, 2007-01-25 at 22:19 -0500, Jan Wieck wrote:

 The idea is to clone an existing serializable transactions snapshot 
 visibility information from one backend to another. The semantics would 
 be like this:
 
  backend1: start transaction;
  backend1: set transaction isolation level serializable;
  backend1: select pg_backend_pid();
  backend1: select publish_snapshot(); -- will block

Great idea. It can also be used by pg_dump to publish its snapshot so
that we can make VACUUM continue to process effectively while it pg_dump
is running.

Two questions:
- why does it have to block? I don't see any reason - the first process
can begin doing useful work. The second process might fail or itself be
blocked by something.

- why just serializable snapshots?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Gregory Stark
Jan Wieck [EMAIL PROTECTED] writes:

 backend1: select publish_snapshot(); -- will block

 backend2: start transaction;
 backend2: set transaction isolation level serializable;
 backend2: select clone_snapshot(pid); -- will unblock backend1

It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 

set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 

That could be handy for debugging purposes too. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 8:06 AM, Gregory Stark wrote:

Jan Wieck [EMAIL PROTECTED] writes:


backend1: select publish_snapshot(); -- will block

backend2: start transaction;
backend2: set transaction isolation level serializable;
backend2: select clone_snapshot(pid); -- will unblock backend1


It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 


set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 


That would solve the backend to backend IPC problem nicely.


Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:
 On Thu, 2007-01-25 at 22:19 -0500, Jan Wieck wrote:
 
  The idea is to clone an existing serializable transactions snapshot 
  visibility information from one backend to another. The semantics would 
  be like this:
  
   backend1: start transaction;
   backend1: set transaction isolation level serializable;
   backend1: select pg_backend_pid();
   backend1: select publish_snapshot(); -- will block
 
 Great idea. It can also be used by pg_dump to publish its snapshot so
 that we can make VACUUM continue to process effectively while it pg_dump
 is running.

Do you mean we that vacuum would clean up tuples still visible to
pgdump ?

 Two questions:
 - why does it have to block? I don't see any reason - the first process
 can begin doing useful work. The second process might fail or itself be
 blocked by something.

As I see it, it has to block so that it's transaction woud not end so
that the system knows that it can't yet remove tuples in that snapshot.

And it should block util all its consumers have ended their use of the
published snapshot

 - why just serializable snapshots?

There s probably no point to aquire it into read-commited transaction
when the next command will revert to its own snapshot anyway.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 16:09 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:

  Two questions:
  - why does it have to block? I don't see any reason - the first process
  can begin doing useful work. The second process might fail or itself be
  blocked by something.
 
 As I see it, it has to block so that it's transaction woud not end so
 that the system knows that it can't yet remove tuples in that snapshot.
 
 And it should block util all its consumers have ended their use of the
 published snapshot

Agreed that the Snapshot must be visible to all, but thats no reason why
the original call has to block, just that we must do something to
prevent the Snapshot from disappearing from view.

  - why just serializable snapshots?
 
 There s probably no point to aquire it into read-commited transaction
 when the next command will revert to its own snapshot anyway.

But the stated use case was to share snapshots, which seems valid
whatever the type of Snapshot. One of the stated cases was parallel
query...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 16:09 +0200, Hannu Krosing wrote:
 Ühel kenal päeval, R, 2007-01-26 kell 12:25, kirjutas Simon Riggs:

  Great idea. It can also be used by pg_dump to publish its snapshot so
  that we can make VACUUM continue to process effectively while it pg_dump
  is running.
 
 Do you mean we that vacuum would clean up tuples still visible to
 pgdump ?

No, that would break MVCC. But we may have done lots of updates/deletes
that are *not* visible to any Snapshot, yet are not yet removable
because they are higher than OldestXmin but we don't know that because
previously the Snapshot details were not available. ISTM that this
proposal is a way of making the Snapshot limits publicly available so
that they can be used by VACUUM. Sure it isn't every backend, but the
details may be useful. So this is an additional benefit to this
proposal. (There's a hole in the above idea, so don't jump on my back to
explain it - I see it and am trying to work out a way around it...)

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Chris Browne
[EMAIL PROTECTED] (Gregory Stark) writes:
 Jan Wieck [EMAIL PROTECTED] writes:

 backend1: select publish_snapshot(); -- will block

 backend2: start transaction;
 backend2: set transaction isolation level serializable;
 backend2: select clone_snapshot(pid); -- will unblock backend1

 It seems simpler to have a current_snapshot() function that returns an bytea
 or a new snapshot data type which set_current_snapshot(bytea) took to change
 your snapshot. Then you could use tables or out-of-band communication to pass
 around your snapshots however you please. 

 set_current_snapshot() would have to sanity check that the xmin of the new
 snapshot isn't older than the current globaloldestxmin. 

 That could be handy for debugging purposes too. 

Here's a wild thought...  

Would there be any sense in setting up the ability to declare
expressly a transaction's visibility parameters?

Consider that the Slony-I sl_event table records:
  ev_minxid, ev_maxxid, ev_xip

Grabbing a sample from an instance...
 [ ev_minxid| ev_maxxid| ev_xip ] =  [1377591608 | 1377591612 | 
'1377591608','1377591610']

Would it be plausible to, in effect, assert these things?

To say:
start transaction;
set transaction isolation level serializable;
select set_transaction_visibility(1377591608, 1377591612, [1377591608, 
1377591610]);

And thus assert the visibility that was recorded at that point in
time?

I may very well have the parameters characterized in a wrong way;
please assume an appropriate way instead as needed :-).

This would permit, if I am seeing this right, a way that you could, in
effect, get a form of time travel via this where you'd be able to
arbitrarily point at different forms of data visibility.  The wild
part being that you could assert data visibility declarations that a
normal connection couldn't naturally obtain...
-- 
let name=cbbrowne and tld=linuxdatabases.info in name ^ @ ^ tld;;
http://linuxfinances.info/info/multiplexor.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 No, that would break MVCC. But we may have done lots of updates/deletes
 that are *not* visible to any Snapshot, yet are not yet removable
 because they are higher than OldestXmin but we don't know that because
 previously the Snapshot details were not available. ISTM that this
 proposal is a way of making the Snapshot limits publicly available so
 that they can be used by VACUUM.

Certainly not, unless you intend that *every* snapshot *must* be
published, which is an overhead up with which we will not put.

One pretty serious problem with the proposal as written is the part
about the sender blocking until the receiver takes the snap; that means
it's not really a publish in the sense that you can make it available
without worrying about exactly how many readers there might or might not
be.  That alone is sufficient to kill any thought of VACUUM making use
of the info.  I'd feel happier with an implementation more like prepared
transactions: you stuff the information into shared memory and it sits
there, readable by anyone, until such time as you take it down again.
Like prepared xacts, GlobalXmin calculations would need to include these
snapshots (and hence they'd limit vacuums).

A shared-memory area would have to be fixed size, but perhaps backing
files, like those used by prepared xacts, could handle the overflow for
very large xip lists.  Presumably crash safety is not an issue so this
wouldn't require any complicated mechanism.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes:
 Ühel kenal päeval, N, 2007-01-25 kell 22:19, kirjutas Jan Wieck:
 The cloning process needs to make sure that the clone_snapshot() call is 
 made from the same DB user in the same database as corresponding 
 publish_snapshot() call was done. 

 Why ? Snapshot is universal and same for whole db instance, so why limit
 it to same user/database ?

Yeah.  Use-case: pg_dumpall could guarantee that it produces consistent
snapshots across multiple databases.  (Not sure I actually want that,
but it's at least arguably useful to someone.)

I think you would want to mark a snapshot with an owner, but that would
be for the purpose of restricting who could take it down, not who could
copy it.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 11:36 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  No, that would break MVCC. But we may have done lots of updates/deletes
  that are *not* visible to any Snapshot, yet are not yet removable
  because they are higher than OldestXmin but we don't know that because
  previously the Snapshot details were not available. ISTM that this
  proposal is a way of making the Snapshot limits publicly available so
  that they can be used by VACUUM.
 
 Certainly not, unless you intend that *every* snapshot *must* be
 published, which is an overhead up with which we will not put.

Agreed, but that's the general case problem.

What I was hoping was that this would provide a mechanism for long
running transactions (LRTs) to publish their min/max Xids. Then if all
backends publish the minimum Xid of any Snapshot they have generated in
the proc array, we'd be able to decide if there are any large holes in
the global set of Snapshots. As a general case that's hard to evaluate,
but in the common case of a lone LRT and all the rest short duration
transactions you can end up with a gap of 250,000+ transactions opening
up between the two. It would be fairly easy to have VACUUM check for
large visibility gaps between groups of transactions and then use that
to improve its effectiveness in the presence of LRTs.

Theoretically we have to keep the chain of intermediate updates around
so it can be traversed by the old transaction, but in practical terms
traversing a long chain of updates isn't sensible. Serializable LRTs
will never traverse the chain anyway (that's a serializability error),
but there are some special cases to consider, hence my mentioning an
unresolved problem previously.

We'd need to be much more careful about the way Snapshots are managed,
so we can be certain that we take them all into account.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 set_current_snapshot() would have to sanity check that the xmin of the new
 snapshot isn't older than the current globaloldestxmin. 

 That would solve the backend to backend IPC problem nicely.

 But it fails on the count of making sure that globaloldestxmin doesn't
 advance past the snap you want to use.  And exactly how will you pass
 a snap through a table?  It won't become visible until you commit ...
 whereupon your own xmin isn't blocking the advance of globaloldestxmin.

Hm, good point. You could always do it in a separate connection, but that
starts to get annoying. I was more envisioning passing it around out-of-band
though, something like:


$db-execute(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE);
$snap = $db-execute(select current_snapshot());

  for each db {
  if (fork())
  $slave[i] = $db-connect();
  $slave[i]-execute(select set_snapshot($snap));
  $slave[i]-execute(copy table[i] to file[i]);
  }


I'm also wondering about something like:

  $db-execute(SET TRANSACTION ISOLATION LEVEL SERIALIZABLE);
  $snap = $db-execute(select current_snapshot());

  if (fork())
  $slave = $db-connect();
  $slave-execute(select set_snapshot($snap);
  $slave-execute(copy tab from hugefile);
  signal parent
  } else {
  while(no signal yet) {
  $rows_loaded_so_far = $db-execute(select count(*) from tab);
  display_progress($rows_loaded_so_far);
  sleep(60);
  }
  }


Sorry for the vaguely perlish pseudocode but it's the clearest way I can think
to write it. I don't think it would make much sense to try to do anything like
this in plpgsql; I think you really do want to be doing it in a language
outside the database where it's easier to open multiple connections and handle
IPC.

I realize the second idea might take more hackery than just setting the
snapshot... In particular as written above it wouldn't work because the slave
would be writing with a new xid that isn't actually in the snapshot.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 12:22 PM, Simon Riggs wrote:

On Fri, 2007-01-26 at 11:36 -0500, Tom Lane wrote:

Simon Riggs [EMAIL PROTECTED] writes:
 No, that would break MVCC. But we may have done lots of updates/deletes
 that are *not* visible to any Snapshot, yet are not yet removable
 because they are higher than OldestXmin but we don't know that because
 previously the Snapshot details were not available. ISTM that this
 proposal is a way of making the Snapshot limits publicly available so
 that they can be used by VACUUM.

Certainly not, unless you intend that *every* snapshot *must* be
published, which is an overhead up with which we will not put.


Agreed, but that's the general case problem.

What I was hoping was that this would provide a mechanism for long
running transactions (LRTs) to publish their min/max Xids. Then if all
backends publish the minimum Xid of any Snapshot they have generated in
the proc array, we'd be able to decide if there are any large holes in
the global set of Snapshots. As a general case that's hard to evaluate,
but in the common case of a lone LRT and all the rest short duration
transactions you can end up with a gap of 250,000+ transactions opening
up between the two. It would be fairly easy to have VACUUM check for
large visibility gaps between groups of transactions and then use that
to improve its effectiveness in the presence of LRTs.


There is a flaw in that theory. If you have a single LTR, then each 
subsequent transactions xmin will be exactly that one, no?



Jan



Theoretically we have to keep the chain of intermediate updates around
so it can be traversed by the old transaction, but in practical terms
traversing a long chain of updates isn't sensible. Serializable LRTs
will never traverse the chain anyway (that's a serializability error),
but there are some special cases to consider, hence my mentioning an
unresolved problem previously.

We'd need to be much more careful about the way Snapshots are managed,
so we can be certain that we take them all into account.




--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Jan Wieck

On 1/26/2007 11:58 AM, Tom Lane wrote:

Jan Wieck [EMAIL PROTECTED] writes:

On 1/26/2007 8:06 AM, Gregory Stark wrote:

It seems simpler to have a current_snapshot() function that returns an bytea
or a new snapshot data type which set_current_snapshot(bytea) took to change
your snapshot. Then you could use tables or out-of-band communication to pass
around your snapshots however you please. 


set_current_snapshot() would have to sanity check that the xmin of the new
snapshot isn't older than the current globaloldestxmin. 



That would solve the backend to backend IPC problem nicely.


But it fails on the count of making sure that globaloldestxmin doesn't
advance past the snap you want to use.  And exactly how will you pass
a snap through a table?  It won't become visible until you commit ...
whereupon your own xmin isn't blocking the advance of globaloldestxmin.


The client receives the snapshot information as a result from the 
function call to current_snapshot(). The call to 
set_current_snapshot(snap) errors out if snap's xmin is older than 
globaloldestxmin. It is the client app that has to make sure that the 
transaction that created snap is still in progress.


I didn't say passing anything through a table.

Take a modified pg_dump as an example. It could write multiple files. A 
pre-load sql with the first part of the schema. Then a post-load sql 
with the finalization of same (creating indexes, adding constraints). It 
then builds a list of all relations to COPY, starts n threads each 
writing a different file. Each thread connects to the DB and adjusts the 
snapshot to the one of the main transaction (which is still open). Then 
each thread grabs the next table to dump from the list and writes the 
COPY data to its output file. The threads exit when the list of tables 
is empty. The main thread waits until the last thread has joined and 
commits the main transaction.


Wouldn't be too hard to write a script that restores that split dump in 
parallel as well.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Simon Riggs
On Fri, 2007-01-26 at 12:43 -0500, Jan Wieck wrote:

 There is a flaw in that theory. If you have a single LTR, then each 
 subsequent transactions xmin will be exactly that one, no?

You got me. My description was too loose, but you also got the rough
picture. We'll save the detail for another day, but we all know its a
bridge we will have to cross one day, soon. I wasn't meaning to raise
this specific discussion now, just to say that publishing snapshots for
known LRTs is one way by which we can solve the LRT/VACUUMing issue.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 You got me. My description was too loose, but you also got the rough
 picture. We'll save the detail for another day, but we all know its a
 bridge we will have to cross one day, soon. I wasn't meaning to raise
 this specific discussion now, just to say that publishing snapshots for
 known LRTs is one way by which we can solve the LRT/VACUUMing issue.

I don't actually see that it buys you a darn thing ... you still won't
be able to delete dead updated tuples because of the possibility of the
LRT deciding to chase ctid chains up from the tuples it can see.   You
also seem to be assuming that a transaction can have only one snapshot,
which is not something we can enforce in enough cases to make it a very
useful restriction.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Proposal: Snapshot cloning

2007-01-25 Thread Jan Wieck
Granted this one has a few open ends so far and I'd like to receive some 
constructive input on how to actually implement it.


The idea is to clone an existing serializable transactions snapshot 
visibility information from one backend to another. The semantics would 
be like this:


backend1: start transaction;
backend1: set transaction isolation level serializable;
backend1: select pg_backend_pid();
backend1: select publish_snapshot(); -- will block

backend2: start transaction;
backend2: set transaction isolation level serializable;
backend2: select clone_snapshot(pid); -- will unblock backend1

backend1: select publish_snapshot();

backend3: start transaction;
backend3: set transaction isolation level serializable;
backend3: select clone_snapshot(pid);

...

This will allow a number of separate backends to assume the same MVCC 
visibility, so that they can query independent but the overall result 
will be according to one consistent snapshot of the database.


What I try to accomplish with this is to widen a bottleneck, many 
current Slony users are facing. The initial copy of a database is 
currently limited to one single reader to copy a snapshot of the data 
provider. With the above functionality, several tables could be copied 
in parallel by different client threads, feeding separate backends on 
the receiving side at the same time.


The feature could also be used by a parallel version of pg_dump as well 
as data mining tools.


The cloning process needs to make sure that the clone_snapshot() call is 
made from the same DB user in the same database as corresponding 
publish_snapshot() call was done. Since publish_snapshot() only 
publishes the information, it gained legally and that is visible in the 
PGPROC shared memory (xmin, xmax being the crucial part here), there is 
no risk of creating a snapshot for which data might have been removed by 
vacuum already.


What I am not sure about yet is what IPC method would best suit the 
transfer of the arbitrarily sized xip vector. Ideas?



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Proposal: Snapshot cloning

2007-01-25 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-01-25 kell 22:19, kirjutas Jan Wieck:
 Granted this one has a few open ends so far and I'd like to receive some 
 constructive input on how to actually implement it.
 
 The idea is to clone an existing serializable transactions snapshot 
 visibility information from one backend to another. The semantics would 
 be like this:
 
  backend1: start transaction;
  backend1: set transaction isolation level serializable;
  backend1: select pg_backend_pid();
  backend1: select publish_snapshot(); -- will block
 
  backend2: start transaction;
  backend2: set transaction isolation level serializable;
  backend2: select clone_snapshot(pid); -- will unblock backend1
 
  backend1: select publish_snapshot();
 
  backend3: start transaction;
  backend3: set transaction isolation level serializable;
  backend3: select clone_snapshot(pid);
 
  ...
 
 This will allow a number of separate backends to assume the same MVCC 
 visibility, so that they can query independent but the overall result 
 will be according to one consistent snapshot of the database.

I see uses for this in implementing query parallelism in user level
code, like querying two child tables in two separate processes. 

 What I try to accomplish with this is to widen a bottleneck, many 
 current Slony users are facing. The initial copy of a database is 
 currently limited to one single reader to copy a snapshot of the data 
 provider. With the above functionality, several tables could be copied 
 in parallel by different client threads, feeding separate backends on 
 the receiving side at the same time.

I'm afraid that for most configurations this would make the copy slower,
as there will be mode random disk i/o.

Maybe better fix slony so that it allows initial copies in different
parallel transactions, or just do initial copy in several sets and merge
the sets later.

 The feature could also be used by a parallel version of pg_dump as well 
 as data mining tools.
 
 The cloning process needs to make sure that the clone_snapshot() call is 
 made from the same DB user in the same database as corresponding 
 publish_snapshot() call was done. 

Why ? Snapshot is universal and same for whole db instance, so why limit
it to same user/database ?

 Since publish_snapshot() only 
 publishes the information, it gained legally and that is visible in the 
 PGPROC shared memory (xmin, xmax being the crucial part here), there is 
 no risk of creating a snapshot for which data might have been removed by 
 vacuum already.
 
 What I am not sure about yet is what IPC method would best suit the 
 transfer of the arbitrarily sized xip vector. Ideas?
 
 
 Jan
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly