Re: [HACKERS] Exposing the Xact commit order to the user

2010-08-30 Thread Simon Riggs
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:

 In some systems (data warehousing, replication), the order of commits is
 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.

This idea had support from at least 6 hackers. I'm happy to add my own.

Can I suggest it is added as a hook, rather than argue about the details
too much? The main use case is in combination with external systems, so
that way we can maintain the relevant code with the system that cares
about it.

 CommitTransaction() inside of xact.c will call a function, that inserts
 a new record into this array. The operation will for most of the time be
 nothing than taking a spinlock and adding the record to shared memory.
 All the data for the record is readily available, does not require
 further locking and can be collected locally before taking the spinlock.
 The begin_timestamp is the transactions idea of CURRENT_TIMESTAMP, the
 commit_timestamp is what CommitTransaction() just decided to write into
 the WAL commit record and the total_rowcount is the sum of inserted,
 updated and deleted heap tuples during the transaction, which should be
 easily available from the statistics collector, unless row stats are
 disabled, in which case the datum would be zero.

Does this need to be called while in a critical section? Or can we wait
until after the actual marking of the commit before calling this?

 Checkpoint handling will call a function to flush the shared buffers.
 Together with this, the information from WAL records will be sufficient
 to recover this data (except for row counts) during crash recovery.

So it would need to work identically in recovery also?

These two values are not currently stored in the commit WAL record.

timestamptz   xci_begin_timestamp
int64 xci_total_rowcount

Both of those seem optional, so I don't really want them added to WAL.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-14 Thread Alvaro Herrera
Excerpts from Marko Kreen's message of jue jun 10 18:10:50 -0400 2010:

 Jan's proposal of storing small struct into segmented files
 sounds like it could work.  Can't say anything more because
 I can't imagine it as well as Jan.  Would need to play with
 working implementation to say more...

We already have such a thing -- see pg_multixact

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-10 Thread Marko Kreen
On 6/4/10, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark gsst...@mit.edu wrote:
   A function which takes a starting xid and a number of transactions to
   return seems very tied to one particular application. I could easily
   see other systems such as a multi-master system instead only wanting
   to compare two transactions to find out which committed first. Or
   non-replication applications where you have an LSN and want to know
   whether a given transaction had committed by that time.
  
   So one possible interface would be to do something like
   xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with
   an optional argument to limit the number or records returned.


 I'm imagining that the backend data storage for this would be a file
  containing, essentially, a struct for each commit repeated over and
  over again, packed tightly.  It's easy to index into such a file using
  a sequence number (give me the 1000'th commit) but searching by LSN
  would require (a) storing the LSNs and (b) binary search.  Maybe it's
  worth adding that complexity, but I'm not sure that it is.  Keeping
  the size of this file small is important for ensuring that it has
  minimal performance impact (which is also why I'm not sold on trying
  to include the tuple counters that Jan proposed - I think we can solve
  the problem he's worried about there more cleanly in other ways).

AIUI, you index the file by offset.

I think
   we should be very careful about assuming that we understand
   replication and its needs better than someone who has spent many years
   developing one of the major PostgreSQL replication solutions.
  
   Well the flip side of that is that we want an interface that's useful
   for more than just one replication system. This is something basic
   enough that I think it will be useful for more than just replication
   if we design it generally enough. It should be useful for
   backup/restore processes and monitoring as well as various forms of
   replication including master-slave trigger based systems but also
   including PITR-based replication, log-parsing systems, multi-master
   trigger based systems, 2PC-based systems, etc.


 Making it general enough to serve multiple needs is good, but we've
  got to make sure that the extra complexity is buying us something.
  Jan seems pretty confident that this could be used by Londiste also,
  though it would be nice to have some confirmation from the Londiste
  developer(s) on that.  I think it may also have applications for
  distributed transactions and multi-master replication, but I am not
  too sure it helps much for PITR-based replication or log-parsing
  systems.  We want to design something that is good, but trying to
  solve too many problems may end up solving none of them well.

The potential for single shared queue implementation, with
the additional potential for merging async replication
implementations sounds attractive.  (Merging ~ having
single one that satisfies broad range of needs.)

Unless the functionality accepted into core will be limited
to replication only and/or performs worse than current
snapshot-based grouping.  Then it is uninteresting, of course.

Jan's proposal of storing small struct into segmented files
sounds like it could work.  Can't say anything more because
I can't imagine it as well as Jan.  Would need to play with
working implementation to say more...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-04 Thread Greg Stark
On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote:
 I find the skeptical attitude on this thread altogether unwarranted.
 Jan made his case and, at least IMHO, presented it pretty clearly.

Just to be clear I think the idea of exposing commit order is a
no-brainer.  The specific interface is what I was questioning.

A function which takes a starting xid and a number of transactions to
return seems very tied to one particular application. I could easily
see other systems such as a multi-master system instead only wanting
to compare two transactions to find out which committed first. Or
non-replication applications where you have an LSN and want to know
whether a given transaction had committed by that time.

So one possible interface would be to do something like
xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with
an optional argument to limit the number or records returned.

So you could do:

old := select pg_current_xlog_location();
while (1)
{
  sleep 60s;
  new := select pg_current_xlog_location()
  process(select xids_committed_between(old,new))
  old := new
}

This might be more useful for PITR recovery for example where you want
to find out what transactions committed between now and some known
point of corruption.

I could also see it being useful to have a function
pg_xlog_location_of_commit(xid). That would let you run recovery until
a particular transaction committed or test whether your replica is
caught up to a particular commit. It could be useful for monitoring
Hot Standby slaves.


  He
 then answered, multiple times, numerous questions which were already
 addressed in the original email, as well as various others.

I think I did miss some of the original description. That might have
caused some of the difficulty as I was asking questions about
something he assumed he had already answered.

  I think
 we should be very careful about assuming that we understand
 replication and its needs better than someone who has spent many years
 developing one of the major PostgreSQL replication solutions.

Well the flip side of that is that we want an interface that's useful
for more than just one replication system. This is something basic
enough that I think it will be useful for more than just replication
if we design it generally enough. It should be useful for
backup/restore processes and monitoring as well as various forms of
replication including master-slave trigger based systems but also
including PITR-based replication, log-parsing systems, multi-master
trigger based systems, 2PC-based systems, etc.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-04 Thread Robert Haas
On Fri, Jun 4, 2010 at 10:44 AM, Greg Stark gsst...@mit.edu wrote:
 On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote:
 I find the skeptical attitude on this thread altogether unwarranted.
 Jan made his case and, at least IMHO, presented it pretty clearly.

 Just to be clear I think the idea of exposing commit order is a
 no-brainer.  The specific interface is what I was questioning.

OK, thanks for that clarification.

 A function which takes a starting xid and a number of transactions to
 return seems very tied to one particular application. I could easily
 see other systems such as a multi-master system instead only wanting
 to compare two transactions to find out which committed first. Or
 non-replication applications where you have an LSN and want to know
 whether a given transaction had committed by that time.

 So one possible interface would be to do something like
 xids_committed_between(lsn_start, lsn_end) -- and yes, possibly with
 an optional argument to limit the number or records returned.

I'm imagining that the backend data storage for this would be a file
containing, essentially, a struct for each commit repeated over and
over again, packed tightly.  It's easy to index into such a file using
a sequence number (give me the 1000'th commit) but searching by LSN
would require (a) storing the LSNs and (b) binary search.  Maybe it's
worth adding that complexity, but I'm not sure that it is.  Keeping
the size of this file small is important for ensuring that it has
minimal performance impact (which is also why I'm not sold on trying
to include the tuple counters that Jan proposed - I think we can solve
the problem he's worried about there more cleanly in other ways).

 So you could do:

 old := select pg_current_xlog_location();
 while (1)
 {
  sleep 60s;
  new := select pg_current_xlog_location()
  process(select xids_committed_between(old,new))
  old := new
 }

 This might be more useful for PITR recovery for example where you want
 to find out what transactions committed between now and some known
 point of corruption.

This could also be done by selecting the current commit sequence
number, getting the XIDs committed between the two commit sequence
numbers, etc.

 I could also see it being useful to have a function
 pg_xlog_location_of_commit(xid). That would let you run recovery until
 a particular transaction committed or test whether your replica is
 caught up to a particular commit. It could be useful for monitoring
 Hot Standby slaves.

Well, you'd need to index the commit data to make that work, I think,
so that adds a lot of complexity.  The implementation as proposed lets
you find the commits after a known point in order of occurrence, but
it doesn't let you inquire about the location of a particular commit.
If you want to run recovery until a particular transaction commits, we
could teach the recovery code to look for the commit record for that
XID and then pause at that point (or just before that point, if
someone wanted that as an alternative behavior), which would be much
simpler than using this mechanism.  And if you want to check whether
slaves are caught up, it would probably be better to use LSN rather
than commits, because you could be caught up on commits but way behind
on WAL replay.

  I think
 we should be very careful about assuming that we understand
 replication and its needs better than someone who has spent many years
 developing one of the major PostgreSQL replication solutions.

 Well the flip side of that is that we want an interface that's useful
 for more than just one replication system. This is something basic
 enough that I think it will be useful for more than just replication
 if we design it generally enough. It should be useful for
 backup/restore processes and monitoring as well as various forms of
 replication including master-slave trigger based systems but also
 including PITR-based replication, log-parsing systems, multi-master
 trigger based systems, 2PC-based systems, etc.

Making it general enough to serve multiple needs is good, but we've
got to make sure that the extra complexity is buying us something.
Jan seems pretty confident that this could be used by Londiste also,
though it would be nice to have some confirmation from the Londiste
developer(s) on that.  I think it may also have applications for
distributed transactions and multi-master replication, but I am not
too sure it helps much for PITR-based replication or log-parsing
systems.  We want to design something that is good, but trying to
solve too many problems may end up solving none of them well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-04 Thread Alvaro Herrera
Excerpts from Jan Wieck's message of jue jun 03 19:52:19 -0400 2010:
 On 6/3/2010 7:11 PM, Alvaro Herrera wrote:

  Why not send separate numbers of tuple inserts/updates/deletes, which we
  already have from pgstats?
 
 We only have them for the entire database. The purpose of this is just a 
 guesstimate about what data volume to expect if I were to select all log 
 from a particular transaction.

But we already have per table counters.  Couldn't we aggregate them per
transaction as well, if this feature is enabled?  I'm guessing that this
is going to have some uses besides Slony; vague measurements could turn
out to be unusable for some of these.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-04 Thread Jan Wieck

On 6/4/2010 10:44 AM, Greg Stark wrote:

On Fri, Jun 4, 2010 at 2:32 AM, Robert Haas robertmh...@gmail.com wrote:

I find the skeptical attitude on this thread altogether unwarranted.
Jan made his case and, at least IMHO, presented it pretty clearly.


Just to be clear I think the idea of exposing commit order is a
no-brainer.  The specific interface is what I was questioning.

A function which takes a starting xid and a number of transactions to
return seems very tied to one particular application. I could easily
see other systems such as a multi-master system instead only wanting
to compare two transactions to find out which committed first. Or
non-replication applications where you have an LSN and want to know
whether a given transaction had committed by that time.


Read the proposal again. I mean the original mail that started this 
tread. The function does NOT take an xid as argument.


Being able to compare two xid's against each other with respect to their 
commit order is eventually useful. The serial number of the data set, 
returned by the SRF as proposed, would perfectly satisfy that need. But 
not the way you envision for multimaster. Multimaster would ask did xid 
X from server A commit before or after xid Y from server B? That is a 
question completely outside the scope of this proposal.


Please keep it real.


Jan




--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-04 Thread Jan Wieck

On 6/4/2010 12:52 PM, Alvaro Herrera wrote:

Excerpts from Jan Wieck's message of jue jun 03 19:52:19 -0400 2010:

On 6/3/2010 7:11 PM, Alvaro Herrera wrote:



 Why not send separate numbers of tuple inserts/updates/deletes, which we
 already have from pgstats?

We only have them for the entire database. The purpose of this is just a 
guesstimate about what data volume to expect if I were to select all log 
from a particular transaction.


But we already have per table counters.  Couldn't we aggregate them per
transaction as well, if this feature is enabled?  I'm guessing that this
is going to have some uses besides Slony; vague measurements could turn
out to be unusable for some of these.


We have them per table and per index, summarized over all transactions. 
It is debatable if bloating this feature with detailed statistics is 
useful or not, but I'd rather not have that bloat at the beginning, 
because otherwise I know exactly what is going to happen. People will 
just come back and say zero impact my a...



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Jan Wieck

On 6/2/2010 7:49 PM, Greg Stark wrote:

On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote:

It would make it easy to conclude:

  This next transaction did 8328194 updates.  Maybe we should do
  some kind of checkpoint (e.g. - commit transaction or such) before
  working on it.

   versus

  This transaction we're thinking of working on had 7 updates.  No
  big deal...


I'm puzzled how you would define this value. How do you add 7 inserts,
7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
inserts and 7 deletes worth twice as much as the 7 updates when
they're basically the same thing? What if the inserts fired triggers
which inserted 7 more rows, is that 14? What if the 7 updates modified
2 TB of TOAST data but the 8238194 updates were all to the same record
and they were all HOT updates so all it did was change 8kB?

In any case you'll have all the actual data from your triggers or
hooks or whatever so what value does having the system keep track of
this add?


The point is not that we don't have that information now. The point is 
having a hint BEFORE wading through possibly gigabytes of WAL or log data.


If getting that information requires to read all the log data twice or 
the need to read gigabytes of otherwise useless WAL data (as per Bruce's 
suggestion), we better not get it at all and just keep doing what we are 
doing now.


I actually have a hard time understanding why people are so opposed to a 
feature that has zero impact at all unless a DBA actually turns in ON. 
What is the problem with exposing the commit order of transactions?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Bruce Momjian
Jan Wieck wrote:
 The point is not that we don't have that information now. The point is 
 having a hint BEFORE wading through possibly gigabytes of WAL or log data.
 
 If getting that information requires to read all the log data twice or 
 the need to read gigabytes of otherwise useless WAL data (as per Bruce's 
 suggestion), we better not get it at all and just keep doing what we are 
 doing now.
 
 I actually have a hard time understanding why people are so opposed to a 
 feature that has zero impact at all unless a DBA actually turns in ON. 
 What is the problem with exposing the commit order of transactions?

If you want to fork Postgres and add it, go ahead, but if the community
has to maintain the code and document it, we care.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 
 I actually have a hard time understanding why people are so
 opposed to a feature that has zero impact at all unless a DBA
 actually turns in ON.  What is the problem with exposing the
 commit order of transactions?
 
FWIW, once I came to understand the use case, it seems to me a
perfectly reasonable and useful thing to have.  It does strike me
that there may be value to add one more xid to support certain
types of integrity for some use cases, but that's certainly
something which could be added later, if at all.  Once I realized
that, I just dropped out of the discussion; perhaps I should have
bowed out with an endorsement.
 
Unless my memory is failing me worse than usual, Dan Ports, who is
working on the serializable implementation so he can use the
predicate locking with a transaction-aware caching feature, needs
the ability to track commit order of transactions by xid; so the use
cases go beyond Slony and Londiste.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Jan Wieck

On 6/3/2010 4:04 PM, Bruce Momjian wrote:

If you want to fork Postgres and add it, go ahead, but if the community
has to maintain the code and document it, we care.


That comment was rather unprofessional. I think the rest of us still try 
to find the best solution for the problem, not kill the discussion. You 
may want to rejoin that effort.


I care about an efficient, low overhead way to get a certain 
information, that is otherwise extremely difficult, expensive and 
version dependent to get.


I care about cleaning up more of the mistakes, made in the original 
development of Slony. Namely using hacks and kluges to implement 
details, not supported by a current version of PostgreSQL. Londiste and 
Slony made a good leap on that with the txid data type. Slony made 
another step like that with 2.0, switching to the (for that very purpose 
developed and contributed) native trigger configuration instead of 
hacking system catalogs. This would be another step in that direction 
and we would be able to unify Londiste's and Slony's transport mechanism 
and eliminating the tick/sync kluge.


Care to explain what exactly you care about?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Chris Browne
br...@momjian.us (Bruce Momjian) writes:
 Jan Wieck wrote:
 The point is not that we don't have that information now. The point is 
 having a hint BEFORE wading through possibly gigabytes of WAL or log data.
 
 If getting that information requires to read all the log data twice or 
 the need to read gigabytes of otherwise useless WAL data (as per Bruce's 
 suggestion), we better not get it at all and just keep doing what we are 
 doing now.
 
 I actually have a hard time understanding why people are so opposed to a 
 feature that has zero impact at all unless a DBA actually turns in ON. 
 What is the problem with exposing the commit order of transactions?

 If you want to fork Postgres and add it, go ahead, but if the community
 has to maintain the code and document it, we care.

Are you caring or opposing?  It seems rather uncharitable to imply
that Jan doesn't care.

I know *I'm* not interested in a forked Postgres for this - I would
prefer to find out what things could be done that don't involve gross
amounts of WAL file grovelling for data that mayn't necessarily even
be available.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Chris Browne
gsst...@mit.edu (Greg Stark) writes:
 On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote:
 It would make it easy to conclude:

   This next transaction did 8328194 updates.  Maybe we should do
   some kind of checkpoint (e.g. - commit transaction or such) before
   working on it.

    versus

   This transaction we're thinking of working on had 7 updates.  No
   big deal...

 I'm puzzled how you would define this value. How do you add 7 inserts,
 7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
 inserts and 7 deletes worth twice as much as the 7 updates when
 they're basically the same thing? What if the inserts fired triggers
 which inserted 7 more rows, is that 14? What if the 7 updates modified
 2 TB of TOAST data but the 8238194 updates were all to the same record
 and they were all HOT updates so all it did was change 8kB?

The presence of those questions (and their ambiguity) is the reason
why there's a little squirming as to whether this is super-useful and
super-necessary.

What this offers is *SOME* idea of how much updating work a particular
transaction did.  It's a bit worse than you suggest:

 - If replication triggers have captured tuples, those would get
   counted.

 - TOAST updates might lead to extra updates being counted.

But back to where you started, I'd anticipate 7 inserts, 7 deletes,
and 7 updates being counted as something around 21 updates.

And if that included 5 TOAST changes, it might bump up to 26.

If there were replication triggers in place, that might bump the count
up to 45 (which I chose arbitrarily).

 In any case you'll have all the actual data from your triggers or
 hooks or whatever so what value does having the system keep track of
 this add?

This means that when we'd pull the list of transactions to consider,
we'd get something like:

 select * from next_transactions('4218:23', 50);

[list of 50 transactions returned, each with...
   - txid
   - START timestamp
   - COMMIT timestamp
   - Approximate # of updates

Then, for each of the 50, I'd pull replication log data for the
corresponding transaction.

If I have the approximate # of updates, that might lead me to stop
short, and say:

  That next update looks like a doozy!  I'm going to stop and commit
  what I've got before doing that one.

It's not strictly necessary, but would surely be useful for flow
control.
-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Greg Stark
On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck janwi...@yahoo.com wrote:
 I'm puzzled how you would define this value. How do you add 7 inserts,
 7 deletes, and 7 updates? Is that 21 rows modified?

 I actually have a hard time understanding why people are so opposed to a
 feature that has zero impact at all unless a DBA actually turns in ON. What
 is the problem with exposing the commit order of transactions?

The post you were responding to was regarding the meaninglessness of
the number of records attribute you wanted. Your response is a non
sequitor.

I think the commit order of transactions would be a good thing to
expose though I've asked repeatedly what kind of interface you need
and never gotten answers to all the questions.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Kevin Grittner
Greg Stark gsst...@mit.edu wrote:
 
 what kind of interface you need
 
For the potential uses I can see, it would be great to have a SRF
which took two parameters: xid of last known commit and a limit how
many commits past that to return.  Perhaps a negative number could
move earlier in time, if that seems reasonable to others.  I think
that's also consistent with Jan's posts.  A GUC to enable it and
some way to specify retention (or force cleanup) are the only other
user-facing features which come to mind for me.  (Not sure what form
that last should take, but didn't Jan say something about both of
these early in the thread?)
 
Do you see a need for something else (besides, obviously, docs)?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Kevin Grittner
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Greg Stark gsst...@mit.edu wrote:
  
 what kind of interface you need
  
 For the potential uses I can see, it would be great to have a SRF
 which took two parameters: xid of last known commit and a limit
 how many commits past that to return.
 
Jan's very first post had it right; my idea was flawed:
 
| Exposing the data will be done via a set returning function. The
| SRF takes two arguments. The maximum number of rows to return and
| the last serial number processed by the reader. The advantage of
| such SRF is that the result can be used in a query that right away
| delivers audit or replication log information in transaction
| commit order. The SRF can return an empty set if no further
| transactions have committed since, or an error if data segments
| needed to answer the request have already been purged.
| 
| Purging of the data will be possible in several different ways.
| Autovacuum will call a function that drops segments of the data
| that are outside the postgresql.conf configuration with respect to
| maximum age or data volume. There will also be a function reserved
| for superusers to explicitly purge the data up to a certain serial
| number.
 
Apologies for not looking back to the start of the thread before
that last post.  It was all laid out right at the start.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Bruce Momjian
Jan Wieck wrote:
 On 6/3/2010 4:04 PM, Bruce Momjian wrote:
  If you want to fork Postgres and add it, go ahead, but if the community
  has to maintain the code and document it, we care.
 
 That comment was rather unprofessional. I think the rest of us still try 
 to find the best solution for the problem, not kill the discussion. You 
 may want to rejoin that effort.
 
 I care about an efficient, low overhead way to get a certain 
 information, that is otherwise extremely difficult, expensive and 
 version dependent to get.
 
 I care about cleaning up more of the mistakes, made in the original 
 development of Slony. Namely using hacks and kluges to implement 
 details, not supported by a current version of PostgreSQL. Londiste and 
 Slony made a good leap on that with the txid data type. Slony made 
 another step like that with 2.0, switching to the (for that very purpose 
 developed and contributed) native trigger configuration instead of 
 hacking system catalogs. This would be another step in that direction 
 and we would be able to unify Londiste's and Slony's transport mechanism 
 and eliminating the tick/sync kluge.
 
 Care to explain what exactly you care about?

Here is what I was replying to:

  I actually have a hard time understanding why people are so opposed t$
   feature that has zero impact at all unless a DBA actually turns in ON.
  What is the problem with exposing the commit order of transactions?

Jan's comment is why should others care what he wants because it has
zero impact?  I am saying the community cares because we have to
maintain the code.  I stand by my comment.

I remember a dismissive comment by Jan when 'session_replication_role'
was added, and a similar strong comment from me at that time as well. 
It seems we are doing this again.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Bruce Momjian
Bruce Momjian wrote:
 Jan Wieck wrote:
  On 6/3/2010 4:04 PM, Bruce Momjian wrote:
   If you want to fork Postgres and add it, go ahead, but if the community
   has to maintain the code and document it, we care.
  
  That comment was rather unprofessional. I think the rest of us still try 
  to find the best solution for the problem, not kill the discussion. You 
  may want to rejoin that effort.
  
  I care about an efficient, low overhead way to get a certain 
  information, that is otherwise extremely difficult, expensive and 
  version dependent to get.
  
  I care about cleaning up more of the mistakes, made in the original 
  development of Slony. Namely using hacks and kluges to implement 
  details, not supported by a current version of PostgreSQL. Londiste and 
  Slony made a good leap on that with the txid data type. Slony made 
  another step like that with 2.0, switching to the (for that very purpose 
  developed and contributed) native trigger configuration instead of 
  hacking system catalogs. This would be another step in that direction 
  and we would be able to unify Londiste's and Slony's transport mechanism 
  and eliminating the tick/sync kluge.
  
  Care to explain what exactly you care about?
 
 Here is what I was replying to:
 
   I actually have a hard time understanding why people are so opposed t$
feature that has zero impact at all unless a DBA actually turns in ON.
   What is the problem with exposing the commit order of transactions?
 
 Jan's comment is why should others care what he wants because it has
 zero impact?  I am saying the community cares because we have to
 maintain the code.  I stand by my comment.
 
 I remember a dismissive comment by Jan when 'session_replication_role'
 was added, and a similar strong comment from me at that time as well. 
 It seems we are doing this again.

Of course, if I am misintepreting what Jan said, please let me know.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Alvaro Herrera
Excerpts from Chris Browne's message of jue jun 03 16:21:35 -0400 2010:

 What this offers is *SOME* idea of how much updating work a particular
 transaction did.  It's a bit worse than you suggest:
 
  - If replication triggers have captured tuples, those would get
counted.
 
  - TOAST updates might lead to extra updates being counted.
 
 But back to where you started, I'd anticipate 7 inserts, 7 deletes,
 and 7 updates being counted as something around 21 updates.
 
 And if that included 5 TOAST changes, it might bump up to 26.
 
 If there were replication triggers in place, that might bump the count
 up to 45 (which I chose arbitrarily).

Why not send separate numbers of tuple inserts/updates/deletes, which we
already have from pgstats?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Jan Wieck

On 6/3/2010 5:58 PM, Greg Stark wrote:

On Thu, Jun 3, 2010 at 8:50 PM, Jan Wieck janwi...@yahoo.com wrote:

I'm puzzled how you would define this value. How do you add 7 inserts,
7 deletes, and 7 updates? Is that 21 rows modified?


I actually have a hard time understanding why people are so opposed to a
feature that has zero impact at all unless a DBA actually turns in ON. What
is the problem with exposing the commit order of transactions?


The post you were responding to was regarding the meaninglessness of
the number of records attribute you wanted. Your response is a non
sequitor.


I never proposed a number of records attribute. I proposed a sum of 
the row counts in the statistics collector. That row count would be a 
mix of insert, update, delete and toast operations. It's not an exact 
indicator of anything, but a good enough hint of how much data may come 
down the pipe if I were to select all replication data belonging to that 
transaction.




I think the commit order of transactions would be a good thing to
expose though I've asked repeatedly what kind of interface you need
and never gotten answers to all the questions.


In the original email that started this whole thread I wrote:


Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.


Did that not answer your question?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Jan Wieck

On 6/3/2010 6:24 PM, Kevin Grittner wrote:


Apologies for not looking back to the start of the thread before
that last post.  It was all laid out right at the start.


No need to apologize. Happens.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Jan Wieck

On 6/3/2010 7:11 PM, Alvaro Herrera wrote:

Excerpts from Chris Browne's message of jue jun 03 16:21:35 -0400 2010:


What this offers is *SOME* idea of how much updating work a particular
transaction did.  It's a bit worse than you suggest:

 - If replication triggers have captured tuples, those would get
   counted.

 - TOAST updates might lead to extra updates being counted.

But back to where you started, I'd anticipate 7 inserts, 7 deletes,
and 7 updates being counted as something around 21 updates.

And if that included 5 TOAST changes, it might bump up to 26.

If there were replication triggers in place, that might bump the count
up to 45 (which I chose arbitrarily).


Why not send separate numbers of tuple inserts/updates/deletes, which we
already have from pgstats?



We only have them for the entire database. The purpose of this is just a 
guesstimate about what data volume to expect if I were to select all log 
from a particular transaction.


This datum isn't critical, just handy for the overall feature to be useful.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Robert Haas
On Thu, Jun 3, 2010 at 6:29 PM, Bruce Momjian br...@momjian.us wrote:
 Jan Wieck wrote:
 On 6/3/2010 4:04 PM, Bruce Momjian wrote:
  If you want to fork Postgres and add it, go ahead, but if the community
  has to maintain the code and document it, we care.

 That comment was rather unprofessional. I think the rest of us still try
 to find the best solution for the problem, not kill the discussion. You
 may want to rejoin that effort.

 I care about an efficient, low overhead way to get a certain
 information, that is otherwise extremely difficult, expensive and
 version dependent to get.

 I care about cleaning up more of the mistakes, made in the original
 development of Slony. Namely using hacks and kluges to implement
 details, not supported by a current version of PostgreSQL. Londiste and
 Slony made a good leap on that with the txid data type. Slony made
 another step like that with 2.0, switching to the (for that very purpose
 developed and contributed) native trigger configuration instead of
 hacking system catalogs. This would be another step in that direction
 and we would be able to unify Londiste's and Slony's transport mechanism
 and eliminating the tick/sync kluge.

 Care to explain what exactly you care about?

 Here is what I was replying to:

  I actually have a hard time understanding why people are so opposed t$
   feature that has zero impact at all unless a DBA actually turns in ON.
  What is the problem with exposing the commit order of transactions?

 Jan's comment is why should others care what he wants because it has
 zero impact?  I am saying the community cares because we have to
 maintain the code.  I stand by my comment.

 I remember a dismissive comment by Jan when 'session_replication_role'
 was added, and a similar strong comment from me at that time as well.
 It seems we are doing this again.

I think it's entirely legitimate and proper for us to make a decision
about whether this feature is worth including in core PostgreSQL.  We
are obviously not in the business of adding random features solely for
the benefit of third-party applications.  That having been said, there
are several reasons why I believe that this particular feature is an
excellent candidate for inclusion in core.

1. It solves a problem for which there is no easy workaround.
Rereading all the WAL to extract the commit records is not an easy
workaround, nor is what Slony and Londiste are doing now.

2. It is usable by multiple projects, not just one.  It may well have
applications beyond replication (e.g. distributed transactions), but
at a very minimum it is usable by and useful to multiple replication
solutions.

3. It has a clear specification which can be easily understood even by
people who do not fully understand how replication solutions will make
use of it, which makes code maintenance much less burdensome.
Obviously, Jan's original email on this topic was just a sketch, but I
find it to be pretty clear.

4. We have an existing precedent of being willing to add limited
support into core to allow replication solutions to do their thing
(session_replication_role, ALTER TABLE ... ENABLE REPLICA TRIGGER,
etc).  Even though we now have built-in replication via HS and SR,
there is still a BIG use case for Slony, Londiste, and other add-on
tools.  Making those tools more successful and performant is good for
PostgreSQL.

5. It does not involve highly invasive changes to the core code.

6. It can be turned off for users who don't want it.

I find the skeptical attitude on this thread altogether unwarranted.
Jan made his case and, at least IMHO, presented it pretty clearly.  He
then answered, multiple times, numerous questions which were already
addressed in the original email, as well as various others.  I think
we should be very careful about assuming that we understand
replication and its needs better than someone who has spent many years
developing one of the major PostgreSQL replication solutions.
Independent of Jan's qualifications, there are clearly several people
on this thread who understand why this is useful and valuable,
including me.  I am obviously not in a position to insist that we
accept this feature (assuming Jan produces a patch rather than getting
discouraged and giving up) but I would like us to think very, very
carefully before rejecting it, and not to do so unless we have a DARN
good reason.  Most patches add code, and therefore require code
maintenance - that is not, by itself, a reason to reject them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-03 Thread Bruce Momjian
Bruce Momjian wrote:
  I care about cleaning up more of the mistakes, made in the original 
  development of Slony. Namely using hacks and kluges to implement 
  details, not supported by a current version of PostgreSQL. Londiste and 
  Slony made a good leap on that with the txid data type. Slony made 
  another step like that with 2.0, switching to the (for that very purpose 
  developed and contributed) native trigger configuration instead of 
  hacking system catalogs. This would be another step in that direction 
  and we would be able to unify Londiste's and Slony's transport mechanism 
  and eliminating the tick/sync kluge.
  
  Care to explain what exactly you care about?
 
 Here is what I was replying to:
 
   I actually have a hard time understanding why people are so opposed t$
feature that has zero impact at all unless a DBA actually turns in ON.
   What is the problem with exposing the commit order of transactions?
 
 Jan's comment is why should others care what he wants because it has
 zero impact?  I am saying the community cares because we have to
 maintain the code.  I stand by my comment.
 
 I remember a dismissive comment by Jan when 'session_replication_role'
 was added, and a similar strong comment from me at that time as well. 
 It seems we are doing this again.

FYI, I talked to Jan on the phone and we have resolved this issue.  :-)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-02 Thread Chris Browne
d...@csail.mit.edu (Dan Ports) writes:
 I'm not clear on why the total rowcount is useful, but perhaps I'm
 missing something obvious.

It would make it easy to conclude:

   This next transaction did 8328194 updates.  Maybe we should do
   some kind of checkpoint (e.g. - commit transaction or such) before
   working on it.

versus

   This transaction we're thinking of working on had 7 updates.  No
   big deal...
-- 
(reverse (concatenate 'string ofni.secnanifxunil @ enworbbc))
http://linuxfinances.info/info/finances.html
Rules of  the Evil Overlord #189. I  will never tell the  hero Yes I
was the one who  did it, but you'll never be able  to prove it to that
incompetent  old fool.  Chances  are, that  incompetent  old fool  is
standing behind the curtain.  http://www.eviloverlord.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-02 Thread Chris Browne
heikki.linnakan...@enterprisedb.com (Heikki Linnakangas) writes:
 On 24/05/10 19:51, Kevin Grittner wrote:
 The only thing I'm confused about is what benefit anyone expects to
 get from looking at data between commits in some way other than our
 current snapshot mechanism.  Can someone explain a use case where
 what Jan is proposing is better than snapshot isolation?  It doesn't
 provide any additional integrity guarantees that I can see.

 Right, it doesn't. What it provides is a way to reconstruct a snapshot
 at any point in time, after the fact. For example, after transactions
 A, C, D and B have committed in that order, it allows you to
 reconstruct a snapshot just like you would've gotten immediately after
 the commit of A, C, D and B respectively. That's useful replication
 tools like Slony that needs to commit the changes of those
 transactions in the slave in the same order as they were committed in
 the master.

 I don't know enough of Slony et al. to understand why that'd be better
 than the current heartbeat mechanism they use, taking a snapshot every
 few seconds, batching commits.

I see two advantages:

 a) Identifying things on a transaction-by-transaction basis means that
the snapshots (syncs) don't need to be captured, which is
presently an area of fragility.  If the slon daemon falls over on
Friday evening, and nobody notices until Monday, the snapshot
reverts to being all updates between Friday and whenever SYNCs
start to be collected again.

Exposing commit orders eliminates that fragility.  SYNCs don't
need to be captured anymore, so they can't be missed (which is
today's problem).

 b) The sequence currently used to control log application ordering is
a bottleneck, as it is a single sequence shared across all
connections.

It could be eliminated in favor of (perhaps) an in-memory variable
defined on a per-connection basis.

It's not a bottleneck that we hear a lot of complaints about, but
the sequence certainly is a bottleneck.

-- 
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/internet.html
MS  apparently now  has a  team dedicated  to tracking  problems with
Linux  and publicizing them.   I guess  eventually they'll  figure out
this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-02 Thread Greg Stark
On Wed, Jun 2, 2010 at 6:45 PM, Chris Browne cbbro...@acm.org wrote:
 It would make it easy to conclude:

   This next transaction did 8328194 updates.  Maybe we should do
   some kind of checkpoint (e.g. - commit transaction or such) before
   working on it.

    versus

   This transaction we're thinking of working on had 7 updates.  No
   big deal...

I'm puzzled how you would define this value. How do you add 7 inserts,
7 deletes, and 7 updates? Is that 21 rows modified? Why are the 7
inserts and 7 deletes worth twice as much as the 7 updates when
they're basically the same thing? What if the inserts fired triggers
which inserted 7 more rows, is that 14? What if the 7 updates modified
2 TB of TOAST data but the 8238194 updates were all to the same record
and they were all HOT updates so all it did was change 8kB?

In any case you'll have all the actual data from your triggers or
hooks or whatever so what value does having the system keep track of
this add?



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-01 Thread Jan Wieck

On 5/28/2010 7:19 PM, Bruce Momjian wrote:

Jan Wieck wrote:
 Reading the entire WAL just to find all COMMIT records, then go back to 
 the origin database to get the actual replication log you're looking for 
 is simpler and more efficient? I don't think so.
 
 Agreed, but I think I've not explained myself well enough.
 
 I proposed two completely separate ideas; the first one was this:
 
 If you must get commit order, get it from WAL on *origin*, using exact

 same code that current WALSender provides, plus some logic to read
 through the WAL records and extract commit/aborts. That seems much
 simpler than the proposal you outlined and as SR shows, its low latency
 as well since commits write to WAL. No need to generate event ticks
 either, just use XLogRecPtrs as WALSender already does.
 
 I see no problem with integrating that into core, technically or

 philosophically.
 

Which means that if I want to allow a consumer of that commit order data 
to go offline for three days or so to replicate the 5 requested, low 
volume tables, the origin needs to hang on to the entire WAL log from 
all 100 other high volume tables?


I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.



Stripping it out from what?


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-01 Thread Bruce Momjian
Jan Wieck wrote:
   I see no problem with integrating that into core, technically or
   philosophically.
   
  
  Which means that if I want to allow a consumer of that commit order data 
  to go offline for three days or so to replicate the 5 requested, low 
  volume tables, the origin needs to hang on to the entire WAL log from 
  all 100 other high volume tables?
  
  I suggest writing an external tool that strips out what you need that
  can be run at any time, rather than creating a new data format and
  overhead for this usecase.
  
 
 Stripping it out from what?

Stripping it from the WAL.  Your system seems to require double-writes
on a commit, which is something we have avoided in the past.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + None of us is going to be here forever. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-06-01 Thread Jan Wieck

On 6/1/2010 11:09 AM, Bruce Momjian wrote:

Jan Wieck wrote:

  I see no problem with integrating that into core, technically or
  philosophically.
  
 
 Which means that if I want to allow a consumer of that commit order data 
 to go offline for three days or so to replicate the 5 requested, low 
 volume tables, the origin needs to hang on to the entire WAL log from 
 all 100 other high volume tables?
 
 I suggest writing an external tool that strips out what you need that

 can be run at any time, rather than creating a new data format and
 overhead for this usecase.
 


Stripping it out from what?


Stripping it from the WAL.  Your system seems to require double-writes
on a commit, which is something we have avoided in the past.



Your suggestion seems is based on several false assumptions. This does 
neither require additional physical writes on commit, nor is consuming 
the entire WAL just to filter out commit records anything even remotely 
desirable for systems like Londiste or Slony.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Bruce Momjian
Jan Wieck wrote:
  Reading the entire WAL just to find all COMMIT records, then go back to 
  the origin database to get the actual replication log you're looking for 
  is simpler and more efficient? I don't think so.
  
  Agreed, but I think I've not explained myself well enough.
  
  I proposed two completely separate ideas; the first one was this:
  
  If you must get commit order, get it from WAL on *origin*, using exact
  same code that current WALSender provides, plus some logic to read
  through the WAL records and extract commit/aborts. That seems much
  simpler than the proposal you outlined and as SR shows, its low latency
  as well since commits write to WAL. No need to generate event ticks
  either, just use XLogRecPtrs as WALSender already does.
  
  I see no problem with integrating that into core, technically or
  philosophically.
  
 
 Which means that if I want to allow a consumer of that commit order data 
 to go offline for three days or so to replicate the 5 requested, low 
 volume tables, the origin needs to hang on to the entire WAL log from 
 all 100 other high volume tables?

I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-28 Thread Robert Haas

On May 28, 2010, at 7:19 PM, Bruce Momjian br...@momjian.us wrote:

Jan Wieck wrote:
Reading the entire WAL just to find all COMMIT records, then go  
back to
the origin database to get the actual replication log you're  
looking for

is simpler and more efficient? I don't think so.


Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using  
exact

same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low  
latency

as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.



Which means that if I want to allow a consumer of that commit order  
data

to go offline for three days or so to replicate the 5 requested, low
volume tables, the origin needs to hang on to the entire WAL log from
all 100 other high volume tables?


I suggest writing an external tool that strips out what you need that
can be run at any time, rather than creating a new data format and
overhead for this usecase.


That would be FAR more complex, less robust, and less performant -  
whereas doing what Jan has proposed is pretty straightforward and  
should have minimal impact on performance - or none when not enabled.


...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 On 5/26/2010 4:34 PM, Kevin Grittner wrote:
 
 My latest idea for handling this in WAL-based replication
 involves WAL-logging information about the transaction through
 which a the committing transaction makes it safe to view.  There
 are a few options here at the detail level that I'm still
 thinking through.  The idea would be that the xmin from read-only
 queries on the slaves might be somewhere behind where you would
 expect based on transactions committed.  (The details involve
 such things as where non-serializable transactions fall into the
 plan on both sides, and whether it's worth the effort to
 special-case read-only transactions on the master.)
  
 I can't say that I'm 100% sure that some lurking detail won't
 shoot this technique down for HS, but it seems good to me at a
 conceptual level.
 
 Without simulating multiple simultaneous transactions during
 playback, how are you going to manage that the tuples, already
 inserted on behalf of the ongoing master transactions, disappear
 when they abort on the master?
 
When do writes ever become visible to a snapshot without having been
committed?  I'm not talking about changing that in any way.  I'm
talking about deferring visibility of committed transactions until
they can be viewed without risking serialization anomalies.  This
requires, at a minimum, that any concurrent serializable
transactions which are not read-only have completed.
(Perhaps I'm not understanding your question)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Greg Stark
On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote:
 How about just API generalities? Like, where do you need this data, on
 the master or on the slave? Would PGXC like it on the transaction
 coordinator?

 What question do you need to answer, do you need to pull out sets of
 commits in certain ranges or look up specific transaction ids and find
 out when they committed? Or do you only need to answer which of two
 transaction ids committed first?


This thread has been hard to follow for me. Were any of these
questions answered?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Jan Wieck

On 5/27/2010 9:59 AM, Greg Stark wrote:

On Wed, May 26, 2010 at 5:38 PM, Greg Stark gsst...@mit.edu wrote:

How about just API generalities? Like, where do you need this data, on
the master or on the slave? Would PGXC like it on the transaction
coordinator?

What question do you need to answer, do you need to pull out sets of
commits in certain ranges or look up specific transaction ids and find
out when they committed? Or do you only need to answer which of two
transaction ids committed first?



This thread has been hard to follow for me. Were any of these
questions answered?



Yes.

On 5/26/2010 4:49 PM, Jan Wieck wrote:

On 5/26/2010 12:38 PM, Greg Stark wrote:

 On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote:

 ... but to answer that request, actually I don't even think we should be
 discussing API specifics.

 
 How about just API generalities? Like, where do you need this data, on

 the master or on the slave? Would PGXC like it on the transaction
 coordinator?
 
 What question do you need to answer, do you need to pull out sets of

 commits in certain ranges or look up specific transaction ids and find
 out when they committed? Or do you only need to answer which of two
 transaction ids committed first?


The question I want answered is

   what was the order and xid of the next 0..n transactions, that
   committed after transaction X?

Preferably I would avoid scanning the entire available WAL just to get 
the next n xid's to process.


The proposal assigned a unique serial number (file segment and position 
driven) to each xid and used that for the ordering as well as 
identification of the last known transaction. That is certainly a 
premature implementation detail.


In this implementation it wouldn't even matter if a transaction that was 
recorded actually never made it because it crashed before the WAL flush. 
It would be reported by this commit order feature, but there would be 
no traces of whatever it did to be found inside the DB, so that anomaly 
is harmless.



Jan

-- Anyone who trades liberty for security deserves neither liberty nor 
security. -- Benjamin Franklin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers 




--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-27 Thread Jan Wieck

On 5/27/2010 12:01 PM, Jan Wieck wrote:

On 5/27/2010 9:59 AM, Greg Stark wrote:

This thread has been hard to follow for me. Were any of these
questions answered?



Yes.


The thing missing is any sort of answer to that problem description.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/25/2010 3:18 PM, Kevin Grittner wrote:

Jan Wieck janwi...@yahoo.com wrote:
 

Have you ever looked at one of those queries, that Londiste or
Slony issue against the provider DB in order to get all the log
data that has been committed between two snapshots? Is that really
the best you can think of?
 
No, I admit I haven't.  In fact, I was thinking primarily in terms

of log-driven situations, like HS.  What would be the best place for
me to look to come up to speed on your use case?  (I'm relatively
sure that the issue isn't that there's no information to find, but
that a sequential pass over all available information would take a
*long* time.)  I've been working through the issues on WAL-based
replicas, and have some additional ideas and alternatives, but I'd
like to see the big picture, including trigger-based replication,
before posting.


In short, what both systems are doing is as follows. An AFTER ROW 
trigger records the OLD PK and all changed columns, as well as the txid 
and a global, not cached serial number. Some background process 
periodically starts a serializable transaction and records the resulting 
snapshot.


To replicate from one consistent state to the next, the replication 
system now selects all log rows between two snapshots. Between here 
means it simulates MVCC visibility in the sense of that the writing 
transaction was in progress when the first snapshot was taken and had 
committed at the second. The resulting WHERE clause looks something like


WHERE (xid  s1.xmax OR (xid = s1.xmin AND xid IN (s1.xip)))
  AND (xid  s2.xmin OR (xid = s2.xmax AND xid NOT IN (s2.xip)))

Note that xip here is a comma separated list of txid's. I think it is 
easy to see that this is not a cheap query.


Anyhow, that set of log rows is now ordered by the serial number and 
applied to the replica.


Without this logic, the replication system could not combine multiple 
origin sessions into one replication session without risking to never 
find a state, in which it can commit.


It may be possible to work with two sessions on the replica and not 
require any reordering of the original actions at all. I need to think 
about that for a little longer since this idea just occurred to me a 
second ago.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 7:03 AM, Jan Wieck wrote:
To replicate from one consistent state to the next, the replication 
system now selects all log rows between two snapshots. Between here 
means it simulates MVCC visibility in the sense of that the writing 
transaction was in progress when the first snapshot was taken and had 
committed at the second. The resulting WHERE clause looks something like


Or it entirely happened between the snapshots, obviously.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Greg Stark
On Sun, May 23, 2010 at 9:21 PM, Jan Wieck janwi...@yahoo.com wrote:
 Each record of the Transaction Commit Info consists of

     txid          xci_transaction_id
     timestamptz   xci_begin_timestamp
     timestamptz   xci_commit_timestamp
     int64         xci_total_rowcount


So I think you're going about this backwards.

Instead of discussing implementation I think you should start with the
API the replication system needs. In particular I'm not sure you
really want a server-side query at all. I'm wondering if you wouldn't
be better off with a public machine-parsable text format version of
the WAL. Ie, at the same time as writing out all the nitty gritty to
the binary wal we would write out a summary of public data to an xml
version containing just parts of the data stream that we can promise
won't change, such as transaction id, lsn, timestamp.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 10:04 AM, Greg Stark wrote:

On Sun, May 23, 2010 at 9:21 PM, Jan Wieck janwi...@yahoo.com wrote:

Each record of the Transaction Commit Info consists of

txid  xci_transaction_id
timestamptz   xci_begin_timestamp
timestamptz   xci_commit_timestamp
int64 xci_total_rowcount



So I think you're going about this backward

Instead of discussing implementation I think you should start with the
API the replication system needs. In particular I'm not sure you
really want a server-side query at all. I'm wondering if you wouldn't
be better off with a public machine-parsable text format version of
the WAL. Ie, at the same time as writing out all the nitty gritty to
the binary wal we would write out a summary of public data to an xml
version containing just parts of the data stream that we can promise
won't change, such as transaction id, lsn, timestamp.


Since the actual row level change information and other event data is 
found inside of regular tables, identified by TXID and sequence number, 
I am pretty sure I want that data in a server-side query. What you are 
proposing is to read the xid's and timestamps with an external process, 
that now forcibly needs to reside on the DB server itself (neither 
Londiste nor Slony have that requirement as of today), then bring it 
back into the DB at least inside the WHERE clause of a query.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Steve Singer

Jan Wieck wrote:

Since the actual row level change information and other event data is 
found inside of regular tables, identified by TXID and sequence number, 
I am pretty sure I want that data in a server-side query. What you are 
proposing is to read the xid's and timestamps with an external process, 
that now forcibly needs to reside on the DB server itself (neither 
Londiste nor Slony have that requirement as of today), then bring it 
back into the DB at least inside the WHERE clause of a query.



It depends on how you approach the problem.

If you had a process that could scan WAL files (or a platform/version 
independent representation of these WAL files) you could run that 
process on any server (the origin server, a replica, or some third 
server with the software installed).  Where you run it involves making 
trade-offs on the costs of storing transferring and processing the files 
and would ideally be configurable.


You could then have a process that transfers all of the data logged by 
the triggers to the replicas as soon as it is committed.  Basically 
saying 'copy any rows in sl_log from the origin to the replica that we 
haven't already sent to that replica'


You could then move the work of figuring out the commit order onto the 
replica where you would combine the output of the WAL scanning process 
with the transaction data that has been copied to the replica.






Jan




--
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Robert Haas
On Wed, May 26, 2010 at 11:43 AM, Steve Singer ssin...@ca.afilias.info wrote:
 Jan Wieck wrote:

 Since the actual row level change information and other event data is
 found inside of regular tables, identified by TXID and sequence number, I am
 pretty sure I want that data in a server-side query. What you are proposing
 is to read the xid's and timestamps with an external process, that now
 forcibly needs to reside on the DB server itself (neither Londiste nor Slony
 have that requirement as of today), then bring it back into the DB at least
 inside the WHERE clause of a query.


 It depends on how you approach the problem.

 If you had a process that could scan WAL files (or a platform/version
 independent representation of these WAL files) you could run that process on
 any server (the origin server, a replica, or some third server with the
 software installed).  Where you run it involves making trade-offs on the
 costs of storing transferring and processing the files and would ideally be
 configurable.

 You could then have a process that transfers all of the data logged by the
 triggers to the replicas as soon as it is committed.  Basically saying 'copy
 any rows in sl_log from the origin to the replica that we haven't already
 sent to that replica'

 You could then move the work of figuring out the commit order onto the
 replica where you would combine the output of the WAL scanning process with
 the transaction data that has been copied to the replica.

I'm sure it's possible to make this work however you want to do it,
but I don't really see what advantage Greg Stark's proposal has over
Jan's original proposal.  Recording the commits in one extra place at
commit time is practically free, especially compared to the overall
cost of replication.  Rescanning the WAL seems likely to be much more
expensive and potentially introduces more failure paths.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 10:04 AM, Greg Stark wrote:

Instead of discussing implementation I think you should start with the
API the replication system needs.


... but to answer that request, actually I don't even think we should be 
discussing API specifics.


During PGCon, Marco Kreen, Jim Nasby and I were discussing what the 
requirements of a unified message queue, shared by Londiste and Slony 
may look like. For some use cases of pgq, there isn't even any interest 
in user table changes. These are simply a reliable, database backed 
message passing system.


Today both systems use an agreeable order of changes selected by 
rather expensive queries based on serializable snapshot information and 
a global, non cacheable serial number.


This could be replaced with a logic based on the actual commit order of 
the transactions. This order does not need to be 100% accurate. As long 
as the order is recorded after all user actions have been performed 
(trigger queue shut down) and while the transaction is still holding 
onto its locks, that order is good enough. This will not allow a 
conflicting transaction, waiting on locks to be released, to appear 
having committed before the lock conflict winner.


It is obvious that in cases where only small portions or even none of 
the user table changes are needed, holding on to or even parsing the 
ENTIRE WAL sounds suboptimal for this use case.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Greg Stark
On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote:
 ... but to answer that request, actually I don't even think we should be
 discussing API specifics.


How about just API generalities? Like, where do you need this data, on
the master or on the slave? Would PGXC like it on the transaction
coordinator?

What question do you need to answer, do you need to pull out sets of
commits in certain ranges or look up specific transaction ids and find
out when they committed? Or do you only need to answer which of two
transaction ids committed first?



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas
Could you generate the commit-order log by simply registering a commit 
hook (RegisterXactCallback(XACT_EVENT_COMMIT)) that writes such a log 
somewhere in the data directory? That would work with older versions 
too, no server changes required.


It would not get called during recovery, but I believe that would be 
sufficient for Slony. You could always batch commits that you don't know 
when they committed as if they committed simultaneously.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:
Could you generate the commit-order log by simply registering a commit 
hook (RegisterXactCallback(XACT_EVENT_COMMIT)) that writes such a log 
somewhere in the data directory? That would work with older versions 
too, no server changes required.




That would work, as it seems that the backend keeps holding on to its 
locks until after calling the callbacks.


It would not get called during recovery, but I believe that would be 
sufficient for Slony. You could always batch commits that you don't know 
when they committed as if they committed simultaneously.


Here you are mistaken. If the origin crashes but can recover not yet 
flushed to xlog-commit-order transactions, then the consumer has no idea 
about the order of those commits, which throws us back to the point 
where we require a non cacheable global sequence to replay the 
individual actions of those now batched transactions in an agreeable 
order.


The commit order data needs to be covered by crash recovery.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas

On 26/05/10 21:43, Jan Wieck wrote:

On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:

It would not get called during recovery, but I believe that would be
sufficient for Slony. You could always batch commits that you don't
know when they committed as if they committed simultaneously.


Here you are mistaken. If the origin crashes but can recover not yet
flushed to xlog-commit-order transactions, then the consumer has no idea
about the order of those commits, which throws us back to the point
where we require a non cacheable global sequence to replay the
individual actions of those now batched transactions in an agreeable
order.

The commit order data needs to be covered by crash recovery.


Perhaps I'm missing something, but I thought that Slony currently uses a 
heartbeat, and all transactions committed between two beats are banged 
together and committed as one in the slave so that their relative commit 
order doesn't matter. Can we not do the same for commits missing from 
the commit-order log?


I'm thinking that the commit-order log would contain two kinds of records:

a) Transaction with XID X committed
b) All transactions with XID  X committed

During normal operation we write the 1st kind of record at every commit. 
After crash recovery (perhaps at the first commit after recovery or when 
the slon daemon first polls the server, as there's no hook for 
end-of-recovery), we write the 2nd kind of record.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Perhaps I'm missing something, but I thought that Slony currently uses a
 heartbeat, and all transactions committed between two beats are banged
 together and committed as one in the slave so that their relative commit
 order doesn't matter. 

I guess Slony does the same as pgq here: all events of all those
transactions between two given ticks are batched together in the order
of the event commits. (In fact the batches are made at the consumer
request, so possibly spreading more than 2 ticks at a time).

If you skip that event ordering (within transactions), you can't
maintain foreign keys on the slaves, among other things.

The idea of this proposal is to be able to get this commit order
directly from where the information is maintained, rather than use some
sort of user sequence for that.

So even ordering the txid and txid_snapshots with respect to WAL commit
time (LSN) won't be the whole story, for any given transaction
containing more than one event we also need to have them in order. I
know Jan didn't forget about it so it must either be in the proposal or
easily derived, too tired to recheck.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Robert Haas
On Wed, May 26, 2010 at 4:11 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Perhaps I'm missing something, but I thought that Slony currently uses a
 heartbeat, and all transactions committed between two beats are banged
 together and committed as one in the slave so that their relative commit
 order doesn't matter.

 I guess Slony does the same as pgq here: all events of all those
 transactions between two given ticks are batched together in the order
 of the event commits. (In fact the batches are made at the consumer
 request, so possibly spreading more than 2 ticks at a time).

 If you skip that event ordering (within transactions), you can't
 maintain foreign keys on the slaves, among other things.

 The idea of this proposal is to be able to get this commit order
 directly from where the information is maintained, rather than use some
 sort of user sequence for that.

Exactly.

 So even ordering the txid and txid_snapshots with respect to WAL commit
 time (LSN) won't be the whole story, for any given transaction
 containing more than one event we also need to have them in order. I
 know Jan didn't forget about it so it must either be in the proposal or
 easily derived, too tired to recheck.

Right, so the point is - with this proposal, he can switch to using a
LOCAL sequence number to order events within the session and then
order the sessions using the commit ordering.  Right now, he has to
use a GLOBAL sequence number because there's no way to know the commit
order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 3:16 PM, Heikki Linnakangas wrote:

On 26/05/10 21:43, Jan Wieck wrote:

On 5/26/2010 1:17 PM, Heikki Linnakangas wrote:

It would not get called during recovery, but I believe that would be
sufficient for Slony. You could always batch commits that you don't
know when they committed as if they committed simultaneously.


Here you are mistaken. If the origin crashes but can recover not yet
flushed to xlog-commit-order transactions, then the consumer has no idea
about the order of those commits, which throws us back to the point
where we require a non cacheable global sequence to replay the
individual actions of those now batched transactions in an agreeable
order.

The commit order data needs to be covered by crash recovery.


Perhaps I'm missing something, 


Apparently, more about that at the end.


I'm thinking that the commit-order log would contain two kinds of records:

a) Transaction with XID X committed
b) All transactions with XID  X committed


If that was true then long running transactions would delay all commits 
for transactions that started after them. Do they?




During normal operation we write the 1st kind of record at every commit. 
After crash recovery (perhaps at the first commit after recovery or when 
the slon daemon first polls the server, as there's no hook for 
end-of-recovery), we write the 2nd kind of record.


I think the callback is also called during backend startup, which means 
that it could record the first XID to come which is known from the 
control file and in that case, all  XID's are committed or aborted.


Which leads us to your missing piece above, the need for the global non 
cacheable sequence.


Consider two transactions A and B that due to transaction batching 
between snapshots get applied together. Let the order of actions be


1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those 
individual row actions, then it would not have any idea if within that 
batch the action of B (higher XID) actually came first. Without that 
knowledge there is a 50/50 chance of getting your replica out of sync 
with that simple conflict.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 
 Without this logic, the replication system could not combine
 multiple origin sessions into one replication session without
 risking to never find a state, in which it can commit.
 
My latest idea for handling this in WAL-based replication involves
WAL-logging information about the transaction through which a the
committing transaction makes it safe to view.  There are a few
options here at the detail level that I'm still thinking through.
The idea would be that the xmin from read-only queries on the slaves
might be somewhere behind where you would expect based on
transactions committed.  (The details involve such things as where
non-serializable transactions fall into the plan on both sides, and
whether it's worth the effort to special-case read-only transactions
on the master.)
 
I can't say that I'm 100% sure that some lurking detail won't shoot
this technique down for HS, but it seems good to me at a conceptual
level.
 
I think, however, that this fails to work for systems like Slony and
Londiste because there could be transactions writing to tables which
are not replication targets, so the snapshot adjustments wouldn't be
safe.  True?  (If not true, I think that adding some sort of xmin
value, depending on the answers to the above questions, to Jan's
proposed structure might support better transactional integrity,
even to the level of full serializable support, at the cost of
delaying visibility of committed data.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 4:11 PM, Dimitri Fontaine wrote:

So even ordering the txid and txid_snapshots with respect to WAL commit
time (LSN) won't be the whole story, for any given transaction
containing more than one event we also need to have them in order. I
know Jan didn't forget about it so it must either be in the proposal or
easily derived, too tired to recheck.


No, that detail is actually not explained in the proposal. When applying 
all changes in transaction commit order, there is no need for a global 
sequence. A local counter per backend is sufficient because the total 
order of xact-commit-order, local-xact-seq yields a similarly 
agreeable order of actions.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas

On 26/05/10 23:32, Jan Wieck wrote:

Consider two transactions A and B that due to transaction batching
between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those
individual row actions, then it would not have any idea if within that
batch the action of B (higher XID) actually came first. Without that
knowledge there is a 50/50 chance of getting your replica out of sync
with that simple conflict.


Hmm, I don't see how even a fully reliable WAL-logged commit-order log 
would save you then. It seems that you need to not only know the 
relative order of commits, but the order of commits relative to actions 
within the transactions. I.e. in the above example it's not enough to 
know that B committed before A, you also have to know that A updated the 
row only after B committed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 12:38 PM, Greg Stark wrote:

On Wed, May 26, 2010 at 5:10 PM, Jan Wieck janwi...@yahoo.com wrote:

... but to answer that request, actually I don't even think we should be
discussing API specifics.



How about just API generalities? Like, where do you need this data, on
the master or on the slave? Would PGXC like it on the transaction
coordinator?

What question do you need to answer, do you need to pull out sets of
commits in certain ranges or look up specific transaction ids and find
out when they committed? Or do you only need to answer which of two
transaction ids committed first?


The question I want answered is

  what was the order and xid of the next 0..n transactions, that
  committed after transaction X?

Preferably I would avoid scanning the entire available WAL just to get 
the next n xid's to process.


The proposal assigned a unique serial number (file segment and position 
driven) to each xid and used that for the ordering as well as 
identification of the last known transaction. That is certainly a 
premature implementation detail.


In this implementation it wouldn't even matter if a transaction that was 
recorded actually never made it because it crashed before the WAL flush. 
It would be reported by this commit order feature, but there would be 
no traces of whatever it did to be found inside the DB, so that anomaly 
is harmless.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas

On 26/05/10 23:45, Heikki Linnakangas wrote:

On 26/05/10 23:32, Jan Wieck wrote:

Consider two transactions A and B that due to transaction batching
between snapshots get applied together. Let the order of actions be

1. A starts
2. B starts
3. B selects a row for update, then updates the row
4. A tries to do the same and blocks
5. B commits
6. A gets the lock, the row, does the update
7. A commits

If Slony (or Londiste) would not record the exact order of those
individual row actions, then it would not have any idea if within that
batch the action of B (higher XID) actually came first. Without that
knowledge there is a 50/50 chance of getting your replica out of sync
with that simple conflict.


Hmm, I don't see how even a fully reliable WAL-logged commit-order log
would save you then. It seems that you need to not only know the
relative order of commits, but the order of commits relative to actions
within the transactions. I.e. in the above example it's not enough to
know that B committed before A, you also have to know that A updated the
row only after B committed.


Ok, I think I understand it now. The commit order is enough, because 
replaying the actions in the order all actions of B, then all actions 
of A yields the same result.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Heikki Linnakangas

On 26/05/10 23:49, Jan Wieck wrote:

In this implementation it wouldn't even matter if a transaction that was
recorded actually never made it because it crashed before the WAL flush.
It would be reported by this commit order feature, but there would be
no traces of whatever it did to be found inside the DB, so that anomaly
is harmless.


Hmm, I think it would also not matter if the reported commit order 
doesn't match exactly the order of the commit records, as long as 
there's no dependency between the two transactions.


What I'm after is that I think it would be enough to establish the 
commit order using deferred triggers that are fired during pre-commit 
processing. The trigger could get a number from a global sequence to 
establish the commit order, and write it to a table. So you still need a 
global sequence, but it's only needed once per commit.


(you have to handle deferred triggers that fire after the commit-order 
trigger. perhaps by getting another number from the global sequence and 
replacing the previous number with it)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 4:52 PM, Heikki Linnakangas wrote:

Ok, I think I understand it now. The commit order is enough, because 
replaying the actions in the order all actions of B, then all actions 
of A yields the same result.


Precisely.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 5:12 PM, Heikki Linnakangas wrote:

On 26/05/10 23:49, Jan Wieck wrote:

In this implementation it wouldn't even matter if a transaction that was
recorded actually never made it because it crashed before the WAL flush.
It would be reported by this commit order feature, but there would be
no traces of whatever it did to be found inside the DB, so that anomaly
is harmless.


Hmm, I think it would also not matter if the reported commit order 
doesn't match exactly the order of the commit records, as long as 
there's no dependency between the two transactions.


What I'm after is that I think it would be enough to establish the 
commit order using deferred triggers that are fired during pre-commit 
processing. The trigger could get a number from a global sequence to 
establish the commit order, and write it to a table. So you still need a 
global sequence, but it's only needed once per commit.


You're not trying to derail this thread into yet another of our famous 
commit trigger battles, are you?




(you have to handle deferred triggers that fire after the commit-order 
trigger. perhaps by getting another number from the global sequence and 
replacing the previous number with it)


I could imagine a commit trigger as a special case that is fired AFTER 
the trigger queue was shut down, so any operation that causes any 
further triggers to fire would automatically abort the transaction. A 
draconian, but reasonable restriction.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-26 Thread Jan Wieck

On 5/26/2010 4:34 PM, Kevin Grittner wrote:

Jan Wieck janwi...@yahoo.com wrote:
 

Without this logic, the replication system could not combine
multiple origin sessions into one replication session without
risking to never find a state, in which it can commit.
 
My latest idea for handling this in WAL-based replication involves

WAL-logging information about the transaction through which a the
committing transaction makes it safe to view.  There are a few
options here at the detail level that I'm still thinking through.
The idea would be that the xmin from read-only queries on the slaves
might be somewhere behind where you would expect based on
transactions committed.  (The details involve such things as where
non-serializable transactions fall into the plan on both sides, and
whether it's worth the effort to special-case read-only transactions
on the master.)
 
I can't say that I'm 100% sure that some lurking detail won't shoot

this technique down for HS, but it seems good to me at a conceptual
level.


Without simulating multiple simultaneous transactions during playback, 
how are you going to manage that the tuples, already inserted on behalf 
of the ongoing master transactions, disappear when they abort on the master?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 3:21 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 The subtle point here is whether you consider the view from the outside 
 (in the sense of what a read-only transaction started at an arbitrary time 
 can or cannot observe), or from the inside (what updating transactions can 
 observe and might base their updates on).
 
 The former case is completely determined by the commit ordering of the 
 transactions, while the latter is not - otherwise serializability wouldn't 
 be such a hard problem.
 
 BTW, doesn't all this logic fall in a heap as soon as you consider
 read-committed transactions?


Why would it? There's still a well defined point in time at which the 
transaction's effects become visible, and every other transaction commits 
either before that time or after that time. An observer started between two 
transactions sees the first's changes but not the second's. One replace 
observing read committed transactions by a series of smaller repeatable read 
transactions, since the observers are read-only anyway.

This of course says nothing about what state the updating transactions 
themselves see as the current state. For e.g. replication that is adequate, 
since you'd not replay the original commands but rather the effects they had in 
terms of physical tuple updates. On replay, the effects of a transaction to 
therefor not depend on the state the transaction sees.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Dan Ports d...@csail.mit.edu:

 On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:

 Replicating or recreating the whole predicate locking and conflict
 detection on slaves is not feasible for performance reasons. (I
 won't elaborate unless someone feels that's not intuitively
 obvious.) The only sane way I can see to have a slave database allow
 serializable behavior is to WAL-log the acquisition of a snapshot by
 a serializable transaction, and the rollback or commit, on the
 master, and to have the serializable snapshot build on a slave
 exclude any serializable transactions for which there are still
 concurrent serializable transactions. Yes, that does mean WAL-
 logging the snapshot acquisition even if the transaction doesn't yet
 have an xid, and WAL-logging the commit or rollback even if it never
 acquires an xid.

 One important observation is that any anomaly that occurs on the slave
 can be resolved by aborting a local read-only transaction. This is a
 good thing, because the alternatives are too horrible to consider.

 You could possibly cut the costs of predicate locking by having the
 master ship with each transaction the list of predicate locks it
 acquired. But you'd still have to track locks for read-only
 transactions, so maybe that's not a significant cost improvement. On
 the other hand, if you're willing to pay the price of serializability
 on the master, why not the slaves too?

I don't understand the problem. According to me, in the context of
SSI, a read-only slave can just map SERIALIZABLE to the technical
implementation of REPEATABLE READ (i.e., the currently-existing
SERIALIZABLE). The union of the transactions on the master and the
slave(s) will still exhibit SERIALIZABLE behavior because the
transactions on the slave cannot write anything and are therefore
irrelevant.

Is anything wrong with that reasoning?

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Simon Riggs
On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:

 In some systems (data warehousing, replication), the order of commits is
 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.

I think it would be quite simple to read WAL. WALSender reads the WAL
file after its been flushed, so it would be simple for it to read a blob
of WAL and then extract the commit order from it.

Overall though, it would be easier and more efficient to *add* info to
WAL and then do all this processing *after* WAL has been transported
elsewhere. Extracting info with DDL triggers, normal triggers, commit
order and everything else seems like too much work to me. Every other
RDBMS has moved away from trigger-based replication and we should give
that serious consideration also.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Dan Ports
On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:
 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.

This, unfortunately, isn't true in SSI.

Consider read-only transactions on a single node SSI database -- the
situation is the same for read-only transactions that run on a slave. 
These transactions can be part of anomalies, so they need to be checked
for conflicts and potentially aborted.

Consider Kevin's favorite example, where one table contains the current
date and the other is a list of receipts (initially empty). 
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits
  
T3, which is a read-only transaction, sees the incremented date and an
empty list of receipts. But T1 later commits a new entry in the
receipts table with the old date. No serializable ordering allows this.
However, if T3 hadn't performed its read, there'd be no problem; we'd
just serialize T1 before T2 and no one would be the wiser.

SSI would detect a potential conflict here, which we could resolve by
aborting T3. (We could also abort T1, but if this is a replicated
system this isn't always an option -- T3 might be running on the
slave, so only the slave will know about the conflict, and it can't
very well abort an update transaction on the master.)


There's another example of a read-only transaction anomaly that could
cause similar problems at
http://portal.acm.org/citation.cfm?doid=1031570.1031573, but I think
this one is easier to follow.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 20:18 , Dan Ports wrote:
 On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:
 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.
 
 This, unfortunately, isn't true in SSI.
 
 Consider read-only transactions on a single node SSI database -- the
 situation is the same for read-only transactions that run on a slave. 
 These transactions can be part of anomalies, so they need to be checked
 for conflicts and potentially aborted.
 
 Consider Kevin's favorite example, where one table contains the current
 date and the other is a list of receipts (initially empty). 
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits
 
 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.
 
 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
as long as nobody tries to observe it?

Btw, I still don't get how this follows from the Cahill paper. For a 
transaction to lie on a dangerous circle, it needs incoming and outgoing edges 
in the conflict graph, right? But I'd have though that conflicts are always 
between a reader and a writer or between two writers. So how can a read-only 
transaction have incoming and outgoing edges?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 
 Hm, so in fact SSI sometimes allows the database to be
 inconsistent, but only as long as nobody tries to observe it?
 
Not exactly.  The eventually-persisted state is always consistent,
but there can be a transitory committed state which would violate
user-defined constraints or business rules *if viewed*.  This is
what I've been on about -- the commit sequence is not necessarily
the same as the apparent order of execution.  A read-only
transaction, if run before the overlapping commits settle, can
view a state which is not consistent with any serial order of
execution, and might therefore break the rules.  SSI detects that
and rolls one of the transactions back if they're all running at
serializable transaction isolation in a single SSI database, but the
question is how to handle this when the read happens in a replica.
 
 Btw, I still don't get how this follows from the Cahill paper. For
 a transaction to lie on a dangerous circle, it needs incoming and
 outgoing edges in the conflict graph, right?
 
At least one of the transactions participating in the cycle does. 
There's no requirement that they all do.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Dan Ports
On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote:
 Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
 as long as nobody tries to observe it?

Yes. Note that even while it's in an inconsistent state, you can still
perform any query that doesn't observe the inconsistency -- hopefully
most queries fall into this category.

 Btw, I still don't get how this follows from the Cahill paper. For a 
 transaction to lie on a dangerous circle, it needs incoming and outgoing 
 edges in the conflict graph, right? But I'd have though that conflicts are 
 always between a reader and a writer or between two writers. So how can a 
 read-only transaction have incoming and outgoing edges?

Right, the read-only transaction can't have incoming edges, but it can
have outgoing edges. So it can't be the pivot itself (the transaction
with both outgoing and incoming edges), but it can cause *another*
transaction to be.

In the example I gave, T3 (the r/o transaction) has an outgoing edge to
T1, because it didn't see T1's concurrent update. T1 already had an
outgoing edge to T2, so adding in this incoming edge from T3 creates
the dangerous structure.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Jan Wieck janwi...@yahoo.com wrote:
 
 Have you ever looked at one of those queries, that Londiste or
 Slony issue against the provider DB in order to get all the log
 data that has been committed between two snapshots? Is that really
 the best you can think of?
 
No, I admit I haven't.  In fact, I was thinking primarily in terms
of log-driven situations, like HS.  What would be the best place for
me to look to come up to speed on your use case?  (I'm relatively
sure that the issue isn't that there's no information to find, but
that a sequential pass over all available information would take a
*long* time.)  I've been working through the issues on WAL-based
replicas, and have some additional ideas and alternatives, but I'd
like to see the big picture, including trigger-based replication,
before posting.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Dan Ports d...@csail.mit.edu:

 On Tue, May 25, 2010 at 02:00:42PM +0200, Nicolas Barbier wrote:

 I don't understand the problem. According to me, in the context of
 SSI, a read-only slave can just map SERIALIZABLE to the technical
 implementation of REPEATABLE READ (i.e., the currently-existing
 SERIALIZABLE). The union of the transactions on the master and the
 slave(s) will still exhibit SERIALIZABLE behavior because the
 transactions on the slave cannot write anything and are therefore
 irrelevant.

 This, unfortunately, isn't true in SSI.

 Consider read-only transactions on a single node SSI database -- the
 situation is the same for read-only transactions that run on a slave.
 These transactions can be part of anomalies, so they need to be checked
 for conflicts and potentially aborted.

 Consider Kevin's favorite example, where one table contains the current
 date and the other is a list of receipts (initially empty).
  T1 inserts (select current_date) into receipts, but doesn't commit
  T2 increments current_date and commits
  T3 reads both current_date and the receipt table
  T1 commits

 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.
 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

 SSI would detect a potential conflict here, which we could resolve by
 aborting T3. (We could also abort T1, but if this is a replicated
 system this isn't always an option -- T3 might be running on the
 slave, so only the slave will know about the conflict, and it can't
 very well abort an update transaction on the master.)

Ah, indeed. I made the same reasoning mistake as Florian (presumably)
did: I didn't think of the fact that the read-only transaction doesn't
need to be the pivot.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Florian Pflug
On May 25, 2010, at 20:48 , Dan Ports wrote:
 On Tue, May 25, 2010 at 08:35:44PM +0200, Florian Pflug wrote:
 Hm, so in fact SSI sometimes allows the database to be inconsistent, but 
 only as long as nobody tries to observe it?
 
 Yes. Note that even while it's in an inconsistent state, you can still
 perform any query that doesn't observe the inconsistency -- hopefully
 most queries fall into this category.

Yeah, as long as you just walk by without looking, the database is happy ;-)

 Btw, I still don't get how this follows from the Cahill paper. For a 
 transaction to lie on a dangerous circle, it needs incoming and outgoing 
 edges in the conflict graph, right? But I'd have though that conflicts are 
 always between a reader and a writer or between two writers. So how can a 
 read-only transaction have incoming and outgoing edges?
 
 Right, the read-only transaction can't have incoming edges, but it can
 have outgoing edges. So it can't be the pivot itself (the transaction
 with both outgoing and incoming edges), but it can cause *another*
 transaction to be.
 
 In the example I gave, T3 (the r/o transaction) has an outgoing edge to
 T1, because it didn't see T1's concurrent update. T1 already had an
 outgoing edge to T2, so adding in this incoming edge from T3 creates
 the dangerous structure.

Hm, but for there to be an actual problem (and not a false positive), an actual 
dangerous circle has to exist in the dependency graph. The existence of a 
dangerous structure is just a necessary (but not sufficient) and easily 
checked-for condition for that, right? Now, if a read-only transaction only 
ever has outgoing edges, it cannot be part of a (dangerous or not) circle, and 
hence any dangerous structure it is part of is a false positive.

I guess my line of reasoning is flawed somehow, but I cannot figure out why...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Florian Pflug f...@phlo.org:

 On May 25, 2010, at 20:18 , Dan Ports wrote:

 T3, which is a read-only transaction, sees the incremented date and an
 empty list of receipts. But T1 later commits a new entry in the
 receipts table with the old date. No serializable ordering allows this.

 However, if T3 hadn't performed its read, there'd be no problem; we'd
 just serialize T1 before T2 and no one would be the wiser.

 Hm, so in fact SSI sometimes allows the database to be inconsistent, but only 
 as long as nobody tries to observe it?

I would not call this an inconsistent state: it would become
inconsistent only after someone (e.g., T3) has observed it _and_ T1
commits.

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Nicolas Barbier
2010/5/25 Florian Pflug f...@phlo.org:

 Hm, but for there to be an actual problem (and not a false positive), an
 actual dangerous circle has to exist in the dependency graph. The
 existence of a dangerous structure is just a necessary (but not
 sufficient) and easily checked-for condition for that, right? Now, if a
 read-only transaction only ever has outgoing edges, it cannot be part
 of a (dangerous or not) circle, and hence any dangerous structure it is
 part of is a false positive.

 I guess my line of reasoning is flawed somehow, but I cannot figure out why...

In the general case, wr dependencies also create must be serialized
before edges. It seems that those edges can be discarded when finding
a pivot, but if you want to go back to basics:

( means must be serialized before.)

* T1  T2, because T1 reads a version of a data element for which T2
later creates a newer version (rw between T1 and T2).
* T3  T1, because T3 reads a version of a data element for which T1
later creates a newer version (rw between T3 and T1).
* T2  T3, because T2 creates a version of a data element, which is
then read by T3 (wr between T2 and T3).

(As you can see, those 3 edges form a cycle.)

Nicolas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Florian Pflug f...@phlo.org wrote:
 
 Hm, but for there to be an actual problem (and not a false
 positive), an actual dangerous circle has to exist in the
 dependency graph. The existence of a dangerous structure is just a
 necessary (but not sufficient) and easily checked-for condition
 for that, right? Now, if a read-only transaction only ever has
 outgoing edges, it cannot be part of a (dangerous or not) circle,
 and hence any dangerous structure it is part of is a false
 positive.
 
 I guess my line of reasoning is flawed somehow, but I cannot
 figure out why...
 
Here's why:
 
We're tracking rw-dependencies, where the time-arrow showing
effective order of execution points from the reader to the writer
(since the reader sees a state prior to the write, it effectively
executes before it).  These are important because there have to be
two such dependencies, one in to the pivot and one out from the
pivot, for a problem to exist.  (See various works by Dr. Alan
Fekete, et al, for details.)  But other dependencies can imply an
order of execution.  In particular, a wr-dependency, where a
transaction *can* see data committed by another transaction, implies
that the *writer* came first in the order of execution.  In this
example, the transaction which lists the receipts successfully reads
the control table update, but is not able to read the receipt
insert.  This completes the cycle, making it a real anomaly and not
a false positive.
 
Note that the wr-dependency can actually exist outside the database,
making it pretty much impossible to accurately tell a false positive
from a true anomaly when the pivot exists and the transaction
writing data which the pivot can't read commits first.  For example,
let's say that the update to the control table is committed from an
application which, seeing that its update came back without error,
proceeds to list the receipts for the old date in a subsequent
transaction.  You have a wr-dependency which is, in reality, quite
real and solid with no way to notice it within the database engine. 
That's why the techniques used in SSI are pretty hard to improve
upon beyond more detailed and accurate tracking of rw-conflicts.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/24/2010 9:30 AM, Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?



There is a difference between an empty result set and an exception.


No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?


The data contains timestamps which are supposedly taken in commit order. 
Checking the age of the last entry in the file should be simple enough 
to determine if the segment matches the max age configuration (if 
set). In the case of a superuser telling what to purge he would just 
call a function with a serial number (telling the obsolete segments).



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Tom Lane
Jan Wieck janwi...@yahoo.com writes:
 No, I meant how will the *function* know, if a superuser and/or some 
 background process can purge records at any time?

 The data contains timestamps which are supposedly taken in commit order. 

You can *not* rely on the commit timestamps to be in exact order.
(Perhaps approximate ordering is good enough for what you want here,
but just be careful to not fall into the trap of assuming that they're
exactly ordered.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 maybe we should get serializable working and committed on one
 node first and then worry about how to distribute it.  I think
 there might be other approaches to this problem
 
Well, I've got two or three other ideas on how we can manage this
for HS, but since I now realize that I've totally misunderstood the
main use case for this (which is to support trigger-based
replication), I'd like to be clear on something before letting it
drop.  The big question is, do such replicas need to support
serializable access to the data modified by serializable
transactions in the source database?  That is, is there a need for
such replicas to only see states which are possible in some serial
order of execution of serializable transactions on the source
database?  Or to phrase the same question a third way, should there
be a way to run queries on such replicas with confidence that what
is viewed is consistent with user-defined constraints and business
rules?
 
If not, there's no intersection between this feature and SSI.  If
there is, I think we should think through at least a general
strategy sooner, rather than later.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 12:03 PM, Simon Riggs wrote:

On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:


In some systems (data warehousing, replication), the order of commits is
important, since that is the order in which changes have become visible.
This information could theoretically be extracted from the WAL, but
scanning the entire WAL just to extract this tidbit of information would
be excruciatingly painful.


I think it would be quite simple to read WAL. WALSender reads the WAL
file after its been flushed, so it would be simple for it to read a blob
of WAL and then extract the commit order from it.

Overall though, it would be easier and more efficient to *add* info to
WAL and then do all this processing *after* WAL has been transported
elsewhere. Extracting info with DDL triggers, normal triggers, commit
order and everything else seems like too much work to me. Every other
RDBMS has moved away from trigger-based replication and we should give
that serious consideration also.


Reading the entire WAL just to find all COMMIT records, then go back to 
the origin database to get the actual replication log you're looking for 
is simpler and more efficient? I don't think so.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Simon Riggs
On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:
 On 5/25/2010 12:03 PM, Simon Riggs wrote:
  On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
  
  In some systems (data warehousing, replication), the order of commits is
  important, since that is the order in which changes have become visible.
  This information could theoretically be extracted from the WAL, but
  scanning the entire WAL just to extract this tidbit of information would
  be excruciatingly painful.
  
  I think it would be quite simple to read WAL. WALSender reads the WAL
  file after its been flushed, so it would be simple for it to read a blob
  of WAL and then extract the commit order from it.
  
  Overall though, it would be easier and more efficient to *add* info to
  WAL and then do all this processing *after* WAL has been transported
  elsewhere. Extracting info with DDL triggers, normal triggers, commit
  order and everything else seems like too much work to me. Every other
  RDBMS has moved away from trigger-based replication and we should give
  that serious consideration also.
 
 Reading the entire WAL just to find all COMMIT records, then go back to 
 the origin database to get the actual replication log you're looking for 
 is simpler and more efficient? I don't think so.

Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using exact
same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low latency
as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 4:50 PM, Simon Riggs wrote:

On Tue, 2010-05-25 at 16:41 -0400, Jan Wieck wrote:

On 5/25/2010 12:03 PM, Simon Riggs wrote:
 On Sun, 2010-05-23 at 16:21 -0400, Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of commits is

 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.
 
 I think it would be quite simple to read WAL. WALSender reads the WAL

 file after its been flushed, so it would be simple for it to read a blob
 of WAL and then extract the commit order from it.
 
 Overall though, it would be easier and more efficient to *add* info to

 WAL and then do all this processing *after* WAL has been transported
 elsewhere. Extracting info with DDL triggers, normal triggers, commit
 order and everything else seems like too much work to me. Every other
 RDBMS has moved away from trigger-based replication and we should give
 that serious consideration also.

Reading the entire WAL just to find all COMMIT records, then go back to 
the origin database to get the actual replication log you're looking for 
is simpler and more efficient? I don't think so.


Agreed, but I think I've not explained myself well enough.

I proposed two completely separate ideas; the first one was this:

If you must get commit order, get it from WAL on *origin*, using exact
same code that current WALSender provides, plus some logic to read
through the WAL records and extract commit/aborts. That seems much
simpler than the proposal you outlined and as SR shows, its low latency
as well since commits write to WAL. No need to generate event ticks
either, just use XLogRecPtrs as WALSender already does.

I see no problem with integrating that into core, technically or
philosophically.



Which means that if I want to allow a consumer of that commit order data 
to go offline for three days or so to replicate the 5 requested, low 
volume tables, the origin needs to hang on to the entire WAL log from 
all 100 other high volume tables?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-25 Thread Jan Wieck

On 5/25/2010 4:16 PM, Tom Lane wrote:

Jan Wieck janwi...@yahoo.com writes:
No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?


The data contains timestamps which are supposedly taken in commit order. 


You can *not* rely on the commit timestamps to be in exact order.
(Perhaps approximate ordering is good enough for what you want here,
but just be careful to not fall into the trap of assuming that they're
exactly ordered.)


I am well aware of the fact that commit timestamps within the WAL can go 
backwards and that the serial numbers of this proposed implementation of 
commit order can even be different from what the timestamps AND the WAL 
are saying.


As long as the serial number (record position inside of segment) is 
determined while the transaction still holds all its locks, this is 
going to be good enough for what async replication users today are used 
to. Again, it will not magically make it possible to determine a 
serializable order of actions, that happened from transactions running 
in read committed isolation level, post mortem. I don't even even think 
that is possible at all.


And I don't think anyone proposed a solution for that problem anyways.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 In light of the proposed purging scheme, how would it be able to distinguish 
 between those two cases (nothing there yet vs. was there but purged)?

 There is a difference between an empty result set and an exception.

No, I meant how will the *function* know, if a superuser and/or some 
background process can purge records at any time?

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005240928
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv6f0UACgkQvJuQZxSWSsh0xwCgmXLtKngoBBYX0TxDM2TlJRId
AVIAoMHYa3c9Ej2vUJyFufxBR5vDPzQ+
=e1mh
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Sun, May 23, 2010 at 9:44 PM, Jan Wieck janwi...@yahoo.com wrote:
 I'm not sure the retention policies of the shared buffer cache, the WAL
 buffers, CLOG buffers and every other thing we try to cache are that easy to
 fold into one single set of logic. But I'm all ears.

I'm not sure either, although it seems like LRU ought to be good
enough for most things.  I'm more worried about things like whether
the BufferDesc abstraction is going to get in the way.

 CommitTransaction() inside of xact.c will call a function, that inserts
 a new record into this array. The operation will for most of the time be
 nothing than taking a spinlock and adding the record to shared memory.
 All the data for the record is readily available, does not require
 further locking and can be collected locally before taking the spinlock.

 What happens when you need to switch pages?

 Then the code will have to grab another free buffer or evict one.

Hopefully not while holding a spin lock.  :-)

 The function will return the sequence number which CommitTransaction()
 in turn will record in the WAL commit record together with the
 begin_timestamp. While both, the begin as well as the commit timestamp
 are crucial to determine what data a particular transaction should have
 seen, the row count is not and will not be recorded in WAL.

 It would certainly be better if we didn't to bloat the commit xlog
 records to do this.  Is there any way to avoid that?

 If you can tell me how a crash recovering system can figure out what the
 exact sequence number of the WAL commit record at hand should be, let's
 rip it.

Hmm...  could we get away with WAL-logging the next sequence number
just once per checkpoint?  When you replay the checkpoint record, you
update the control file with the sequence number.  Then all the
commits up through the next checkpoint just use consecutive numbers
starting at that value.

 It is an option. Keep it until I tell you is a perfectly valid
 configuration option. One you probably don't want to forget about, but valid
 none the less.

As Tom is fond of saying, if it breaks, you get to keep both pieces.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.
 
This issue intersects with the serializable work I've been doing.
While in database transactions using S2PL the above is true, in
snapshot isolation and the SSI implementation of serializable
transactions, it's not. In particular, the snapshot anomalies which
can cause non-serializable behavior happen precisely because the
apparent order of execution doesn't match anything so linear as
order of commit.
 
I'll raise that receipting example again. You have transactions
which grab the current deposit data and insert it into receipts, as
payments are received. At some point in the afternoon, the deposit
date in a control table is changed to the next day, so that the
receipts up to that point can be deposited during banking hours with
the current date as their deposit date. A report is printed (and
likely a transfer transaction recorded to move cash in drawer to
cash in checking, but I'll ignore that aspect for this example).
Some receipts may not be committed when the update to the date in
the control table is committed.
 
This is eventually consistent -- once all the receipts with the
old date commit or roll back the database is OK, but until then you
might be able to select the new date in the control table and the
set of receipts matching the old date without the database telling
you that you're missing data. The new serializable implementation
fixes this, but there are open RD items (due to the need to discuss
the issues) on the related Wiki page related to hot standby and
other replication. Will we be able to support transactional
integrity on slave machines?
 
What if the update to the control table and the insert of receipts
all happen on the master, but someone decides to move the (now
happily working correctly with serializable transactions) reporting
to a slave machine? (And by the way, don't get too hung up on this
particular example, I could generate dozens more on demand -- the
point is that order of commit doesn't always correspond to apparent
order of execution; in this case the receipts *appear* to have
executed first, because they are using a value later updated to
something else by a different transaction, even though that other
transaction *committed* first.)
 
Replicating or recreating the whole predicate locking and conflict
detection on slaves is not feasible for performance reasons. (I
won't elaborate unless someone feels that's not intuitively
obvious.) The only sane way I can see to have a slave database allow
serializable behavior is to WAL-log the acquisition of a snapshot by
a serializable transaction, and the rollback or commit, on the
master, and to have the serializable snapshot build on a slave
exclude any serializable transactions for which there are still
concurrent serializable transactions. Yes, that does mean WAL-
logging the snapshot acquisition even if the transaction doesn't yet
have an xid, and WAL-logging the commit or rollback even if it never
acquires an xid.
 
I think this solve the issue Jan raises as long as serializable
transactions are used; if they aren't there are no guarantees of
transactional integrity no matter how you track commit sequence,
unless it can be based on S2PL-type blocking locks.  I'll have to
leave that to someone else to sort out.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 11:24 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jan Wieck wrote:

 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.

 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not.

I think you're confusing two subtly different things.  The way to
prove that a set of transactions running under some implementation of
serializability is actually serializable is to construct a serial
order of execution consistent with the view of the database that each
transaction saw.  This may or may not match the commit order, as you
say.  But the commit order is still the order the effects of those
transactions have become visible - if we inserted a new read-only
transaction into the stream at some arbitrary point in time, it would
see all the transactions which committed before it and none of those
that committed afterward.  So I think Jan's statement is correct.

Having said that, I think your concerns about how things will look
from a slave's point of view are possibly valid.  A transaction
running on a slave is essentially a read-only transaction that the
master doesn't know about.  It's not clear to me whether adding such a
transaction to the timeline could result in either (a) that
transaction being rolled back or (b) some impact on which other
transactions got rolled back.  If it did, that would obviously be a
problem for serializability on slaves, though your proposed fix sounds
like it would be prohibitively expensive for many users.  But can this
actually happen?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Robert Haas  wrote:
 
 I think you're confusing two subtly different things.
 
The only thing I'm confused about is what benefit anyone expects to
get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.
 
 But the commit order is still the order the effects of those
 transactions have become visible - if we inserted a new read-only
 transaction into the stream at some arbitrary point in time, it
 would see all the transactions which committed before it and none
 of those that committed afterward.
 
Isn't that what a snapshot does already?
 
 your proposed fix sounds like it would be prohibitively expensive
 for many users. But can this actually happen?
 
How so?  The transaction start/end logging, or looking at that data
when building a snapshot?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Heikki Linnakangas

On 24/05/10 19:51, Kevin Grittner wrote:

The only thing I'm confused about is what benefit anyone expects to
get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.


Right, it doesn't. What it provides is a way to reconstruct a snapshot 
at any point in time, after the fact. For example, after transactions A, 
C, D and B have committed in that order, it allows you to reconstruct a 
snapshot just like you would've gotten immediately after the commit of 
A, C, D and B respectively. That's useful replication tools like Slony 
that needs to commit the changes of those transactions in the slave in 
the same order as they were committed in the master.


I don't know enough of Slony et al. to understand why that'd be better 
than the current heartbeat mechanism they use, taking a snapshot every 
few seconds, batching commits.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 12:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:
 I think you're confusing two subtly different things.

 The only thing I'm confused about is what benefit anyone expects to
 get from looking at data between commits in some way other than our
 current snapshot mechanism.  Can someone explain a use case where
 what Jan is proposing is better than snapshot isolation?  It doesn't
 provide any additional integrity guarantees that I can see.

It's a tool for replication solutions to use.

 But the commit order is still the order the effects of those
 transactions have become visible - if we inserted a new read-only
 transaction into the stream at some arbitrary point in time, it
 would see all the transactions which committed before it and none
 of those that committed afterward.

 Isn't that what a snapshot does already?

Yes, for a particular transaction.  But this is to allow transactions
to be replayed (in order) on another node.

 your proposed fix sounds like it would be prohibitively expensive
 for many users. But can this actually happen?

 How so?  The transaction start/end logging, or looking at that data
 when building a snapshot?

I guess what I'm asking is - if the reconstructed transaction order
inferred by SSI doesn't match the actual commit order, can we get a
serialization anomaly on the standby by replaying transactions there
in commit order?  Can you give an example and explain how your
proposal would solve it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of dom may 23 20:38:14 -0400 2010:
 On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote:
  The system will have postgresql.conf options for enabling/disabling the
  whole shebang, how many shared buffers to allocate for managing access
  to the data and to define the retention period of the data based on data
  volume and/or age of the commit records.
 
 It would be nice if this could just be managed out of shared_buffers
 rather than needing to configure a separate pool just for this
 feature.

FWIW we've talked about this for years -- see old discussions about how
pg_subtrans becomes a bottleneck in certain cases and you want to
enlarge the number of buffers allocated to it (probably easy to find by
searching posts from Jignesh).  I'm guessing the new notify code would
benefit from this as well.

It'd be nice to have as a side effect, but if not, IMHO this proposal
could simply use a fixed buffer pool like all other slru.c callers until
someone gets around to fixing that.  Adding more GUC switches for this
strikes me as overkill.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 3:07 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 It'd be nice to have as a side effect, but if not, IMHO this proposal
 could simply use a fixed buffer pool like all other slru.c callers until
 someone gets around to fixing that.  Adding more GUC switches for this
 strikes me as overkill.

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Dan Ports
On Sun, May 23, 2010 at 04:21:58PM -0400, Jan Wieck wrote:
 In some systems (data warehousing, replication), the order of commits is
 important, since that is the order in which changes have become visible.
 This information could theoretically be extracted from the WAL, but
 scanning the entire WAL just to extract this tidbit of information would
 be excruciatingly painful.

This is very interesting to me as I've been doing some (research --
nowhere near production-level) work on building a transactional
application-level (i.e. memcached-like) cache atop Postgres. One of the
features I needed to support it was basically what you describe.

Without getting too far into the details of what I'm doing, I needed to
make it clear to a higher layer which commits were visible to a given
query. That is, I wanted to know both the order of commits and where
particular snapshots fit into this ordering. (A SnapshotData struct
obviously contains the visibility information, but a representation in
terms of the commit ordering is both more succinct and allows for easy
ordering comparisons).

Something you might want to consider, then, is adding an interface to
find out the timestamp of the current transaction's snapshot, i.e. the
timestamp of the most recent committed transaction visible to it. I
wouldn't expect this to be difficult to implement as transaction
completion/visibility is already synchronized via ProcArrayLock.

 Each record of the Transaction Commit Info consists of
 
   txid  xci_transaction_id
   timestamptz   xci_begin_timestamp
   timestamptz   xci_commit_timestamp
   int64 xci_total_rowcount

Another piece of information that seems useful to provide here would be
the logical timestamp of the transaction, i.e. a counter that's
incremented by one for each transaction. But maybe that's implicit in
the log ordering?

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.


I've actually implemented some semblance of this on Postgres 8.2, but it
sounds like what you're interested in is more sophisticated. In
particular, I wasn't at all concerned with durability or WAL stuff, and
I had some specific requirements about when it was OK to purge the
data. Because of this (and very limited development time), I just
threw something together with a simple shared buffer.

I don't think I have any useful code to offer, but let me know if
there's some way I can help out.

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Kevin Grittner
Robert Haas  wrote:
 
 It's a tool for replication solutions to use.
 
I was thrown by the original post referencing data warehousing.
For replication I definitely see that it would be good to provide
some facility to grab a coherent snapshot out of the transaction
stream, but I'm still not clear on a use case where other solutions
aren't better.  If you want a *particular* past snapshot, something
akin to the transactional caching that Dan Ports mentioned seems
best.  If you just want a coherent snapshot like snapshot isolation,
the current mechanisms seem to work (unless I'm missing something?).
If you want solid data integrity querying the most recent replicated
data, the proposal I posted earlier in the thread is the best I can
see, so far.
 
 if the reconstructed transaction order inferred by SSI doesn't
 match the actual commit order, can we get a serialization anomaly
 on the standby by replaying transactions there in commit order?
 
Yes.  If we don't do *something* to address it, the replicas
(slaves) will operate as read-only snapshot isolation, not true
serializable.
 
 Can you give an example and explain how your proposal would solve
 it?
 
I gave an example (without rigorous proof accompanying it, granted)
earlier in the thread.  In that example, if you allow a selection
against a snapshot which includes the earlier commit (the update of
the control table) and before the later commits (the receipts which
used the old deposit date) you have exactly the kind of
serialization anomaly which the work in progress prevents on the
source (master) database -- the receipts *appear* to run in earlier
transactions because the see the pre-update deposit date, but they
show up out of order.  As far as I'm concerned this is only a
problem if the user *requested* serializable behavior for all
transactions involved.
 
If we send the information I suggested in the WAL stream, then any
slave using the WAL stream could build a snapshot for a serializable
transaction which excluded serializable transactions from the source
which overlap with still-pending serializable transactions on the
source.  In this example, the update of the control table would not
be visible to a serializable transaction on the slave until any
overlapping serializable transactions (which would include any
receipts using the old date) had also committed, so you could never
see the writes out of order.
 
I don't think that passing detailed predicate locking information
would be feasible from a performance perspective, but since the
slaves are read-only, I think it is fine to pass just the minimal
transaction-level information I described.
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Robert Haas
On Mon, May 24, 2010 at 4:03 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas  wrote:

 It's a tool for replication solutions to use.

 I was thrown by the original post referencing data warehousing.
 For replication I definitely see that it would be good to provide
 some facility to grab a coherent snapshot out of the transaction
 stream, but I'm still not clear on a use case where other solutions
 aren't better.  If you want a *particular* past snapshot, something
 akin to the transactional caching that Dan Ports mentioned seems
 best.  If you just want a coherent snapshot like snapshot isolation,
 the current mechanisms seem to work (unless I'm missing something?).
 If you want solid data integrity querying the most recent replicated
 data, the proposal I posted earlier in the thread is the best I can
 see, so far.

Well, AIUI, what you're really trying to do is derive the delta
between an old snapshot and a newer snapshot.

 Can you give an example and explain how your proposal would solve
 it?

 I gave an example (without rigorous proof accompanying it, granted)
 earlier in the thread.  In that example, if you allow a selection
 against a snapshot which includes the earlier commit (the update of
 the control table) and before the later commits (the receipts which
 used the old deposit date) you have exactly the kind of
 serialization anomaly which the work in progress prevents on the
 source (master) database -- the receipts *appear* to run in earlier
 transactions because the see the pre-update deposit date, but they
 show up out of order.

Yep, I see it now.

  As far as I'm concerned this is only a
 problem if the user *requested* serializable behavior for all
 transactions involved.

Agreed.

 If we send the information I suggested in the WAL stream, then any
 slave using the WAL stream could build a snapshot for a serializable
 transaction which excluded serializable transactions from the source
 which overlap with still-pending serializable transactions on the
 source.  In this example, the update of the control table would not
 be visible to a serializable transaction on the slave until any
 overlapping serializable transactions (which would include any
 receipts using the old date) had also committed, so you could never
 see the writes out of order.

 I don't think that passing detailed predicate locking information
 would be feasible from a performance perspective, but since the
 slaves are read-only, I think it is fine to pass just the minimal
 transaction-level information I described.

I suspect that's still going to be sort of hard on performance, but
maybe we should get serializable working and committed on one node
first and then worry about how to distribute it.  I think there might
be other approaches to this problem (global transaction coordinator?
standby requests snapshot from primary?).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Jan Wieck

On 5/24/2010 12:51 PM, Kevin Grittner wrote:

Robert Haas  wrote:
 

I think you're confusing two subtly different things.
 
The only thing I'm confused about is what benefit anyone expects to

get from looking at data between commits in some way other than our
current snapshot mechanism.  Can someone explain a use case where
what Jan is proposing is better than snapshot isolation?  It doesn't
provide any additional integrity guarantees that I can see.
 

But the commit order is still the order the effects of those
transactions have become visible - if we inserted a new read-only
transaction into the stream at some arbitrary point in time, it
would see all the transactions which committed before it and none
of those that committed afterward.
 
Isn't that what a snapshot does already?


It does and the proposed is a mere alternative serving the same purpose.

Have you ever looked at one of those queries, that Londiste or Slony 
issue against the provider DB in order to get all the log data that has 
been committed between two snapshots? Is that really the best you can 
think of?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Jan Wieck

On 5/24/2010 3:10 PM, Dan Ports wrote:

I'm not clear on why the total rowcount is useful, but perhaps I'm
missing something obvious.


It is a glimpse into the future. Several years of pain doing replication 
work has taught me that knowing approximately who much work the next 
chunk will be before you select it all is a really useful thing.



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Dan Ports
On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:
 Jan Wieck wrote:
  
  In some systems (data warehousing, replication), the order of
  commits is important, since that is the order in which changes
  have become visible.
  
 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not. In particular, the snapshot anomalies which
 can cause non-serializable behavior happen precisely because the
 apparent order of execution doesn't match anything so linear as
 order of commit.

All true, but this doesn't pose a problem in snapshot isolation. Maybe
this is obvious to everyone else, but just to be clear: a transaction's
snapshot is determined entirely by which transactions committed before
it snapshotted (and hence are visible to it). Thus, replaying update
transactions in the sae order on a slave makes the same sequence of
states visible to it.

Of course (as in your example) some of these states could expose
snapshot isolation anomalies. But that's true on a single-replica
system too.


Now, stepping into the SSI world...

 Replicating or recreating the whole predicate locking and conflict
 detection on slaves is not feasible for performance reasons. (I
 won't elaborate unless someone feels that's not intuitively
 obvious.) The only sane way I can see to have a slave database allow
 serializable behavior is to WAL-log the acquisition of a snapshot by
 a serializable transaction, and the rollback or commit, on the
 master, and to have the serializable snapshot build on a slave
 exclude any serializable transactions for which there are still
 concurrent serializable transactions. Yes, that does mean WAL-
 logging the snapshot acquisition even if the transaction doesn't yet
 have an xid, and WAL-logging the commit or rollback even if it never
 acquires an xid.

One important observation is that any anomaly that occurs on the slave
can be resolved by aborting a local read-only transaction. This is a
good thing, because the alternatives are too horrible to consider.

You could possibly cut the costs of predicate locking by having the
master ship with each transaction the list of predicate locks it
acquired. But you'd still have to track locks for read-only
transactions, so maybe that's not a significant cost improvement. On
the other hand, if you're willing to pay the price of serializability
on the master, why not the slaves too?

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Florian Pflug
On May 25, 2010, at 0:42 , Dan Ports wrote:
 On Mon, May 24, 2010 at 10:24:07AM -0500, Kevin Grittner wrote:
 Jan Wieck wrote:
 
 In some systems (data warehousing, replication), the order of
 commits is important, since that is the order in which changes
 have become visible.
 
 This issue intersects with the serializable work I've been doing.
 While in database transactions using S2PL the above is true, in
 snapshot isolation and the SSI implementation of serializable
 transactions, it's not. In particular, the snapshot anomalies which
 can cause non-serializable behavior happen precisely because the
 apparent order of execution doesn't match anything so linear as
 order of commit.
 
 All true, but this doesn't pose a problem in snapshot isolation. Maybe
 this is obvious to everyone else, but just to be clear: a transaction's
 snapshot is determined entirely by which transactions committed before
 it snapshotted (and hence are visible to it). Thus, replaying update
 transactions in the sae order on a slave makes the same sequence of
 states visible to it.

The subtle point here is whether you consider the view from the outside (in 
the sense of what a read-only transaction started at an arbitrary time can or 
cannot observe), or from the inside (what updating transactions can observe 
and might base their updates on).

The former case is completely determined by the commit ordering of the 
transactions, while the latter is not - otherwise serializability wouldn't be 
such a hard problem.

For some problems, like replication, the former (outside) view is what 
matters - if slave synthesizes transactions that insert/update/delete the very 
same tuples as the original transaction did, and commits them in the same 
order, no read-only transaction can observe the difference. But that is *not* a 
serial schedule of the original transactions, since the transactions are *not* 
the same - the merely touch the same tuples. In fact, if you try replaying the 
original SQL, you *will* get different results on the slave, and not only 
because of now() and the like.

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-24 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 The subtle point here is whether you consider the view from the outside (in 
 the sense of what a read-only transaction started at an arbitrary time can or 
 cannot observe), or from the inside (what updating transactions can observe 
 and might base their updates on).

 The former case is completely determined by the commit ordering of the 
 transactions, while the latter is not - otherwise serializability wouldn't be 
 such a hard problem.

BTW, doesn't all this logic fall in a heap as soon as you consider
read-committed transactions?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


 Exposing the data will be done via a set returning function. The SRF
 takes two arguments. The maximum number of rows to return and the last
 serial number processed by the reader. The advantage of such SRF is that
 the result can be used in a query that right away delivers audit or
 replication log information in transaction commit order. The SRF can
 return an empty set if no further transactions have committed since, or
 an error if data segments needed to answer the request have already been
 purged.

In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?

- -- 
Greg Sabino Mullane g...@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201005231646
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkv5lIAACgkQvJuQZxSWSsiR3gCgvyK/NPd6WmKGUqdo/3fdWIR7
LAQAoJqk3gYpEgtjw10gINDKFXTAnWO5
=sSvK
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Robert Haas
On Sun, May 23, 2010 at 4:21 PM, Jan Wieck janwi...@yahoo.com wrote:
 The system will have postgresql.conf options for enabling/disabling the
 whole shebang, how many shared buffers to allocate for managing access
 to the data and to define the retention period of the data based on data
 volume and/or age of the commit records.

It would be nice if this could just be managed out of shared_buffers
rather than needing to configure a separate pool just for this
feature.  But, I'm not sure how much work that is, and if it turns out
to be too ugly then I'd say it's not a hard requirement.  In general,
I think we talked during the meeting about the desirability of folding
specific pools into shared_buffers rather than managing them
separately, but I'm not aware that we have any cases where we do that
today so it might be hard (or not).

 Each record of the Transaction Commit Info consists of

     txid          xci_transaction_id
     timestamptz   xci_begin_timestamp
     timestamptz   xci_commit_timestamp
     int64         xci_total_rowcount

 32 bytes total.

Are we sure it's worth including the row count?  I wonder if we ought
to leave that out and let individual clients of the mechanism track
that if they're so inclined, especially since it won't be reliable
anyway.

 CommitTransaction() inside of xact.c will call a function, that inserts
 a new record into this array. The operation will for most of the time be
 nothing than taking a spinlock and adding the record to shared memory.
 All the data for the record is readily available, does not require
 further locking and can be collected locally before taking the spinlock.

What happens when you need to switch pages?

 The function will return the sequence number which CommitTransaction()
 in turn will record in the WAL commit record together with the
 begin_timestamp. While both, the begin as well as the commit timestamp
 are crucial to determine what data a particular transaction should have
 seen, the row count is not and will not be recorded in WAL.

It would certainly be better if we didn't to bloat the commit xlog
records to do this.  Is there any way to avoid that?

 Checkpoint handling will call a function to flush the shared buffers.
 Together with this, the information from WAL records will be sufficient
 to recover this data (except for row counts) during crash recovery.

Right.

 Exposing the data will be done via a set returning function. The SRF
 takes two arguments. The maximum number of rows to return and the last
 serial number processed by the reader. The advantage of such SRF is that
 the result can be used in a query that right away delivers audit or
 replication log information in transaction commit order. The SRF can
 return an empty set if no further transactions have committed since, or
 an error if data segments needed to answer the request have already been
 purged.

 Purging of the data will be possible in several different ways.
 Autovacuum will call a function that drops segments of the data that are
  outside the postgresql.conf configuration with respect to maximum age
 or data volume. There will also be a function reserved for superusers to
 explicitly purge the data up to a certain serial number.

Dunno if autovacuuming this is the right way to go.  Seems like that
could leave to replication breaks, and it's also more work than not
doing that.  I'd just say that if you turn this on you're responsible
for pruning it, full stop.

 Anyone who trades liberty for security deserves neither
 liberty nor security. -- Benjamin Franklin

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Exposing the Xact commit order to the user

2010-05-23 Thread Jan Wieck

On 5/23/2010 4:48 PM, Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160



Exposing the data will be done via a set returning function. The SRF
takes two arguments. The maximum number of rows to return and the last
serial number processed by the reader. The advantage of such SRF is that
the result can be used in a query that right away delivers audit or
replication log information in transaction commit order. The SRF can
return an empty set if no further transactions have committed since, or
an error if data segments needed to answer the request have already been
purged.


In light of the proposed purging scheme, how would it be able to distinguish 
between those two cases (nothing there yet vs. was there but purged)?


There is a difference between an empty result set and an exception.


Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >