Re: [HACKERS] Snapshot synchronization, again...

2011-03-03 Thread Jim Nasby
On Mar 1, 2011, at 10:54 PM, Tom Lane wrote:
> Jim Nasby  writes:
>> Dumb question: Is this something that could be solved by having the 
>> postmaster track this information in it's local memory and make it available 
>> via a variable-sized IPC mechanism, such as a port or socket? That would 
>> eliminate the need to clean things up after a crash; I'm not sure if there 
>> would be other benefits.
> 
> Involving the postmaster in this is entirely *not* reasonable.  The
> postmaster cannot do anything IPC-wise that the stats collector couldn't
> do, and every additional function we load onto the postmaster is another
> potential source of unrecoverable database-wide failures.  The PM is
> reliable only because it doesn't do much.

Makes sense. Doesn't have to be the postmaster; it could be some other process.

Anyway, I just wanted to throw the idea out as food for thought. I don't know 
if it'd be better or worse than temp files...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-03-01 Thread Tom Lane
Jim Nasby  writes:
> Dumb question: Is this something that could be solved by having the 
> postmaster track this information in it's local memory and make it available 
> via a variable-sized IPC mechanism, such as a port or socket? That would 
> eliminate the need to clean things up after a crash; I'm not sure if there 
> would be other benefits.

Involving the postmaster in this is entirely *not* reasonable.  The
postmaster cannot do anything IPC-wise that the stats collector couldn't
do, and every additional function we load onto the postmaster is another
potential source of unrecoverable database-wide failures.  The PM is
reliable only because it doesn't do much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-03-01 Thread Jim Nasby
On Feb 28, 2011, at 11:59 AM, Tom Lane wrote:
> Robert Haas  writes:
>> On Sun, Feb 27, 2011 at 8:33 PM, Joachim Wieland  wrote:
>>> Remember that it's not only about saving shared memory, it's also
>>> about making sure that the snapshot reflects a state of the database
>>> that has actually existed at some point in the past.
> 
>> But you can do all of this with files too, can't you?  Just remove or
>> truncate the file when the snapshot is no longer valid.
> 
> Yeah.  I think adopting a solution similar to 2PC state files is a very
> reasonable way to go here.  This isn't likely to be a high-usage or
> performance-critical feature, so it's not essential to keep the
> information in shared memory for performance reasons.

Dumb question: Is this something that could be solved by having the postmaster 
track this information in it's local memory and make it available via a 
variable-sized IPC mechanism, such as a port or socket? That would eliminate 
the need to clean things up after a crash; I'm not sure if there would be other 
benefits.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-28 Thread Joachim Wieland
On Mon, Feb 28, 2011 at 6:38 PM, Robert Haas  wrote:
>> Remember that it's not only about saving shared memory, it's also
>> about making sure that the snapshot reflects a state of the database
>> that has actually existed at some point in the past. Furthermore, we
>> can easily invalidate a snapshot that we have published earlier by
>> deleting its checksum in shared memory as soon as the original
>> transaction commits/aborts. And for these two a checksum seems to be a
>> good fit. Saving memory then comes as a benefit and makes all those
>> happy who don't want to argue about how many slots to reserve in
>> shared memory or don't want to have another GUC for what will probably
>> be a low-usage feature.
>
> But you can do all of this with files too, can't you?  Just remove or
> truncate the file when the snapshot is no longer valid.

Sure we can, but it looked like the consensus of the first discussion
was that the through-the-client approach was more flexible. But then
again nobody is actively arguing for that anymore.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-28 Thread Tom Lane
Robert Haas  writes:
> On Sun, Feb 27, 2011 at 8:33 PM, Joachim Wieland  wrote:
>> Remember that it's not only about saving shared memory, it's also
>> about making sure that the snapshot reflects a state of the database
>> that has actually existed at some point in the past.

> But you can do all of this with files too, can't you?  Just remove or
> truncate the file when the snapshot is no longer valid.

Yeah.  I think adopting a solution similar to 2PC state files is a very
reasonable way to go here.  This isn't likely to be a high-usage or
performance-critical feature, so it's not essential to keep the
information in shared memory for performance reasons.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-28 Thread Robert Haas
On Sun, Feb 27, 2011 at 8:33 PM, Joachim Wieland  wrote:
> On Sun, Feb 27, 2011 at 3:04 PM, Heikki Linnakangas
>  wrote:
>>> Why exactly, Heikki do you think the hash is more troublesome?
>> It just feels wrong to rely on cryptography just to save some shared memory.
>
> Remember that it's not only about saving shared memory, it's also
> about making sure that the snapshot reflects a state of the database
> that has actually existed at some point in the past. Furthermore, we
> can easily invalidate a snapshot that we have published earlier by
> deleting its checksum in shared memory as soon as the original
> transaction commits/aborts. And for these two a checksum seems to be a
> good fit. Saving memory then comes as a benefit and makes all those
> happy who don't want to argue about how many slots to reserve in
> shared memory or don't want to have another GUC for what will probably
> be a low-usage feature.

But you can do all of this with files too, can't you?  Just remove or
truncate the file when the snapshot is no longer valid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-27 Thread Joachim Wieland
On Sun, Feb 27, 2011 at 3:04 PM, Heikki Linnakangas
 wrote:
>> Why exactly, Heikki do you think the hash is more troublesome?
> It just feels wrong to rely on cryptography just to save some shared memory.

Remember that it's not only about saving shared memory, it's also
about making sure that the snapshot reflects a state of the database
that has actually existed at some point in the past. Furthermore, we
can easily invalidate a snapshot that we have published earlier by
deleting its checksum in shared memory as soon as the original
transaction commits/aborts. And for these two a checksum seems to be a
good fit. Saving memory then comes as a benefit and makes all those
happy who don't want to argue about how many slots to reserve in
shared memory or don't want to have another GUC for what will probably
be a low-usage feature.


> I realize that there are conflicting opinions on this, but from user
> point-of-view the hash is just a variant of the idea of passing the snapshot
> through shared memory, just implemented in an indirect way.

The user will never see the hash, why should he bother? The user point
of view is that he receives data and can obtain the same snapshot if
he passed that data back. This user experience is no different from
any other way of passing the snapshot through the client. And from the
previous discussions this seemed to be what most people wanted.


>> And how
>> could we validate/invalidate snapshots without the checksum (assuming
>> the through-the-client approach instead of storing the whole snapshot
>> in shared memory)?
>
> Either you accept anything that passes sanity checks, or you store the whole
> snapshot in shared memory (or a temp file). I'm not sure which is better,
> but they both seem better than the hash.

True, both might work but I don't see a real technical advantage over
the checksum approach for any of them, rather the opposite.

Nobody has come up with a use case for the accept-anything option so
far, so I don't see why we should commit ourselves to this feature at
this point, given that we have a cheap and easy way of
validating/invalidating snapshots. And I might be just paranoid but I
also fear that someone could raise security issues for the fact that
you would be able to request an arbitrary database state from the past
and inspect changes of other peoples' transactions. We might want to
allow that later though and I realize that we have to allow it for a
standby server that would take over a snapshot from the master anyway,
but I don't want to add this complexity into this first patch. I want
however be able to possibly allow this in the future without touching
the external API of the feature.

And for the tempfile approach, I don't see that the creation and
removal of the temp file is any less code complexity than flipping a
number in shared memory. Also it seemed that people rather wanted to
go with the through-the-client approach because it seems to be more
flexible.

Maybe you should just look at it as a conservative accept-anything
approach that uses a checksum to allow only snapshots that we know
have existed and have been published. Does the checksum still look so
weird from this perspective?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-27 Thread Heikki Linnakangas

On 23.02.2011 03:00, Joachim Wieland wrote:

On Tue, Feb 22, 2011 at 3:34 PM, Heikki Linnakangas
  wrote:

Yes, that's the point I was trying to make. I believe the idea of a hash was
that it takes less memory than storing the whole snapshot (and more
importantly, a fixed amount of memory per snapshot). But I'm not convinced
either that dealing with a hash is any less troublesome.


Both Tom and Robert voted quite explicitly against the
store-in-shared-memory idea. Others don't want to allow people request
arbitrary snapshots and again others wanted to pass the snapshot
through the client so that in the future we could also request
snapshots from standby servers. The hash idea seemed to at least make
nobody unhappy.

For easier review, here are a few links to the previous discusion:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00361.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00383.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00481.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02454.php

Why exactly, Heikki do you think the hash is more troublesome?


It just feels wrong to rely on cryptography just to save some shared memory.

I realize that there are conflicting opinions on this, but from user 
point-of-view the hash is just a variant of the idea of passing the 
snapshot through shared memory, just implemented in an indirect way.



And how
could we validate/invalidate snapshots without the checksum (assuming
the through-the-client approach instead of storing the whole snapshot
in shared memory)?


Either you accept anything that passes sanity checks, or you store the 
whole snapshot in shared memory (or a temp file). I'm not sure which is 
better, but they both seem better than the hash.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 8:55 PM, Alvaro Herrera
 wrote:
> The current discussion seems to have reached the same conclusion as the
> last one: the snapshot info shouldn't leave the server, but be
> transmitted internally -- the idea of the temp file seems prevalent.
> Here's an attempt at a detailed spec:
>
> By invoking pg_export_snapshot(), a transaction can export its current
> transaction snapshot.
>
> To export a snapshot, the transaction creates a temp file containing all
> information about the snapshot, including the BackendId and
> TransactionId of the creating transaction.  The file name is determined
> internally by the exporting transaction, and returned by the function
> that creates the snapshot.  The act of exporting a snapshot causes a
> TransactionId to be acquired, if there is none.
>
> Care is taken that no existing snapshot file is ever overwritten, to
> prevent security problems.  A transaction may export any number of
> snapshots.
>
> The "cookie" to be passed around, then, is a temp file identifier.  This
> cookie gets passed as START TRANSACTION (snapshot='foo'), which sets the
> transaction's snapshot to the one determined by the identifier.  The
> importing transaction must have isolation level serializable or
> repeatable read declared on the same START TRANSACTION command; an
> attempt to import a snapshot into a read committed transaction or one
> with unspecified isolation level causes the transaction to get into
> aborted state (so you still need to say ROLLBACK to get out of it.  This
> is necessary to avoid the session to proceed involuntarily with the
> wrong snapshot.)
>
> Similarly, if the snapshot is not available, an error is raised and the
> transaction block remains in aborted state.  This can happen if the
> originating transaction ended after you opened the file, for example.
> The BackendId and TransactionId of the exporting transaction let the
> importing backend determine the validity of the snapshot beyond the mere
> existence of the file.
>
> The snapshot temp file is to be marked for deletion on transaction end.
> All snapshot temp files are also deleted on crash recovery, to prevent
> dangling files (I need some help here to determine how this plays with
> hot standby.)

I think this can be done within StartupXLOG(), right around where we
ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP).

> Privileges: Any role can export or import a snapshot.  The rationale
> here is that exporting a snapshot doesn't cause any more harm than
> holding a transaction open, so if you let your users do that, then
> there's no extra harm.  Similarly, there's no extra privilege given to a
> role that imports a snapshot that cannot be obtained by sending a START
> TRANSACTION command at the right time.

Agree.

> Note: this proposal doesn't mention restrictions on having
> subtransactions in the exporting transaction.  This is because I haven't
> figured them out, not because I think there shouldn't be any.

I think it's probably sufficient to say that a snapshot can only be
exported from the toplevel transaction.

Overall, this proposal is fine with me.  But then I wasn't the one who
complained about it last time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Alvaro Herrera

The current discussion seems to have reached the same conclusion as the
last one: the snapshot info shouldn't leave the server, but be
transmitted internally -- the idea of the temp file seems prevalent.
Here's an attempt at a detailed spec:

By invoking pg_export_snapshot(), a transaction can export its current
transaction snapshot.

To export a snapshot, the transaction creates a temp file containing all
information about the snapshot, including the BackendId and
TransactionId of the creating transaction.  The file name is determined
internally by the exporting transaction, and returned by the function
that creates the snapshot.  The act of exporting a snapshot causes a
TransactionId to be acquired, if there is none.

Care is taken that no existing snapshot file is ever overwritten, to
prevent security problems.  A transaction may export any number of
snapshots.

The "cookie" to be passed around, then, is a temp file identifier.  This
cookie gets passed as START TRANSACTION (snapshot='foo'), which sets the
transaction's snapshot to the one determined by the identifier.  The
importing transaction must have isolation level serializable or
repeatable read declared on the same START TRANSACTION command; an
attempt to import a snapshot into a read committed transaction or one
with unspecified isolation level causes the transaction to get into
aborted state (so you still need to say ROLLBACK to get out of it.  This
is necessary to avoid the session to proceed involuntarily with the
wrong snapshot.)

Similarly, if the snapshot is not available, an error is raised and the
transaction block remains in aborted state.  This can happen if the
originating transaction ended after you opened the file, for example.
The BackendId and TransactionId of the exporting transaction let the
importing backend determine the validity of the snapshot beyond the mere
existence of the file.

The snapshot temp file is to be marked for deletion on transaction end.
All snapshot temp files are also deleted on crash recovery, to prevent
dangling files (I need some help here to determine how this plays with
hot standby.)

Privileges: Any role can export or import a snapshot.  The rationale
here is that exporting a snapshot doesn't cause any more harm than
holding a transaction open, so if you let your users do that, then
there's no extra harm.  Similarly, there's no extra privilege given to a
role that imports a snapshot that cannot be obtained by sending a START
TRANSACTION command at the right time.


Note: this proposal doesn't mention restrictions on having
subtransactions in the exporting transaction.  This is because I haven't
figured them out, not because I think there shouldn't be any.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 8:00 PM, Joachim Wieland  wrote:
> Both Tom and Robert voted quite explicitly against the
> store-in-shared-memory idea.

No, I voted *for* that approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Joachim Wieland
On Tue, Feb 22, 2011 at 3:34 PM, Heikki Linnakangas
 wrote:
> On 22.02.2011 16:29, Robert Haas wrote:
>> On Tue, Feb 22, 2011 at 8:58 AM, Heikki Linnakangas
>>   wrote:
>>> No, the hash is stored in shared memory. The hash of the garbage has to
>>> match.
>>
>> Oh.  Well that's really silly.  At that point you might as well just
>> store the snapshot and an integer identifier in shared memory, right?
>
> Yes, that's the point I was trying to make. I believe the idea of a hash was
> that it takes less memory than storing the whole snapshot (and more
> importantly, a fixed amount of memory per snapshot). But I'm not convinced
> either that dealing with a hash is any less troublesome.

Both Tom and Robert voted quite explicitly against the
store-in-shared-memory idea. Others don't want to allow people request
arbitrary snapshots and again others wanted to pass the snapshot
through the client so that in the future we could also request
snapshots from standby servers. The hash idea seemed to at least make
nobody unhappy.

For easier review, here are a few links to the previous discusion:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg00361.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00383.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00481.php
http://archives.postgresql.org/pgsql-hackers/2010-12/msg02454.php

Why exactly, Heikki do you think the hash is more troublesome? And how
could we validate/invalidate snapshots without the checksum (assuming
the through-the-client approach instead of storing the whole snapshot
in shared memory)?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 9:34 AM, Heikki Linnakangas
 wrote:
>> Oh.  Well that's really silly.  At that point you might as well just
>> store the snapshot and an integer identifier in shared memory, right?
>
> Yes, that's the point I was trying to make. I believe the idea of a hash was
> that it takes less memory than storing the whole snapshot (and more
> importantly, a fixed amount of memory per snapshot). But I'm not convinced
> either that dealing with a hash is any less troublesome.

OK, sorry for taking a while to get the point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Heikki Linnakangas

On 22.02.2011 16:29, Robert Haas wrote:

On Tue, Feb 22, 2011 at 8:58 AM, Heikki Linnakangas
  wrote:

On 22.02.2011 15:52, Robert Haas wrote:


On Tue, Feb 22, 2011 at 8:01 AM, Heikki Linnakangas
wrote:


Yes. It would be good to perform those sanity checks anyway.


I don't think it's good; I think it's absolutely necessary.  Otherwise
someone can generate arbitrary garbage, hash it, and feed it to us.
No?


No, the hash is stored in shared memory. The hash of the garbage has to
match.


Oh.  Well that's really silly.  At that point you might as well just
store the snapshot and an integer identifier in shared memory, right?


Yes, that's the point I was trying to make. I believe the idea of a hash 
was that it takes less memory than storing the whole snapshot (and more 
importantly, a fixed amount of memory per snapshot). But I'm not 
convinced either that dealing with a hash is any less troublesome.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 8:58 AM, Heikki Linnakangas
 wrote:
> On 22.02.2011 15:52, Robert Haas wrote:
>>
>> On Tue, Feb 22, 2011 at 8:01 AM, Heikki Linnakangas
>>   wrote:
>>>
>>> Yes. It would be good to perform those sanity checks anyway.
>>
>> I don't think it's good; I think it's absolutely necessary.  Otherwise
>> someone can generate arbitrary garbage, hash it, and feed it to us.
>> No?
>
> No, the hash is stored in shared memory. The hash of the garbage has to
> match.

Oh.  Well that's really silly.  At that point you might as well just
store the snapshot and an integer identifier in shared memory, right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Heikki Linnakangas

On 22.02.2011 15:52, Robert Haas wrote:

On Tue, Feb 22, 2011 at 8:01 AM, Heikki Linnakangas
  wrote:

Yes. It would be good to perform those sanity checks anyway.


I don't think it's good; I think it's absolutely necessary.  Otherwise
someone can generate arbitrary garbage, hash it, and feed it to us.
No?


No, the hash is stored in shared memory. The hash of the garbage has to 
match.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 8:01 AM, Heikki Linnakangas
 wrote:
> This is hashing, not encryption, there is no key. The point is that even if
> the attacker has the hash value and knows the algorithm, he can not
> construct *another* snapshot that has the same hash.

What good does that do us?

> Yes. It would be good to perform those sanity checks anyway.

I don't think it's good; I think it's absolutely necessary.  Otherwise
someone can generate arbitrary garbage, hash it, and feed it to us.
No?

> But even if we don't allow it, there's no harm in sending the whole snapshot
> to the client, anyway. Ie. instead of "1" as the identifier, use the
> snapshot itself. That leaves the door open for allowing it in the future,
> should we choose to do so.

The door is open either way, AFAICS: we could eventually allow:

BEGIN TRANSACTION (SNAPSHOT '1');
and also
BEGIN TRANSACTION (SNAPSHOT '{xmin 123 xmax 456 xids 128 149 201}');

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Heikki Linnakangas

On 22.02.2011 14:24, Robert Haas wrote:

On Tue, Feb 22, 2011 at 1:59 AM, Heikki Linnakangas
  wrote:

If you don't use a cryptographically secure hash, it's easy to construct a
snapshot with the same hash as an existing snapshot, with more or less
arbitrary contents.


And if you did use a cryptographically secure hash, it would still be
easy, unless there is some plausible way of keeping the key a secret,
which I doubt.


This is hashing, not encryption, there is no key. The point is that even 
if the attacker has the hash value and knows the algorithm, he can not 
construct *another* snapshot that has the same hash. A cryptographically 
strong hash function has that property, second preimage resistance, 
whereas for a regular CRC or similar it's easy to create an arbitrary 
input that has any given checksum.



The original reason why we went with this design (send the snapshot to
the client and then have the client send it back to the server) was
because some people thought it could be useful to take a snapshot from
the master and recreate it on a standby, or perhaps the other way
around.  If that's the goal, checking whether the snapshot being
imported is one that's already in use is missing the point.  We have
to rely on our ability to detect, when importing the snapshot, any
anomalies that could lead to system instability; and allow people to
import an arbitrary snapshot that meets those constraints, even one
that the user just created out of whole cloth.


Yes. It would be good to perform those sanity checks anyway.


Now, as I said before, I think that's a bad idea.  I think we should
NOT be designing the system to allow publication of arbitrary
snapshots; instead, the backend that wishes to export its snapshot
should so request, getting back a token (like "1") that other backends
can then pass to START TRANSACTION (SNAPSHOT '1'), or whatever syntax
we end up using.


Well, I'm not sure if we should allow importing arbitrary (as long as 
they're sane) snapshots or not; the arguments for and against that are 
both compelling.


But even if we don't allow it, there's no harm in sending the whole 
snapshot to the client, anyway. Ie. instead of "1" as the identifier, 
use the snapshot itself. That leaves the door open for allowing it in 
the future, should we choose to do so.



 In that design, there's no need to validate
snapshots because they never leave the server's memory space, which
IMHO is as it should be.


I agree that if we don't allow importing arbitrary snapshots, we should 
use some out-of-band communication to get the snapshot to the backend. 
IOW not rely on a hash.



Another reason I don't like this approach is because it's possible
that the representation of snapshots might change in the future.  Tom
mused a while back about using an LSN as a snapshot (snapshots sees
all transactions committed prior to that LSN) and there are other
plausible changes, too.  If we expose the internal details of the
snapshot mechanism to users, then (1) it becomes a user-visible
feature with backward compatibility implications if we choose to break
it down the road


Clients should treat the snapshot as an opaque block.


and (2) all future snapshot representations must be
able to be fully sanity checked on import.  Right now it's fairly easy
to verify that the snapshot's xmin isn't too old and that the
remaining XIDs are in a reasonable range and that's probably all we
really need to be certain of, but it's not clear that some future
representation would be equally easy to validate.


Perhaps, although I can't imagine a representation that wouldn't be 
equally easy to validate.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-22 Thread Robert Haas
On Tue, Feb 22, 2011 at 1:59 AM, Heikki Linnakangas
 wrote:
> Really? The idea of the hash is to prevent you from importing arbitrary
> snapshots into the system, allowing only copying snapshots that are in use
> by another backend. The purpose of the hash is to make sure the snapshot the
> client passes in is identical to one used by another backend.

If that's the purpose of the hash, it is utterly useless.  The
computation performed by the backend can easily be replicated by an
adversary.

> If you don't use a cryptographically secure hash, it's easy to construct a
> snapshot with the same hash as an existing snapshot, with more or less
> arbitrary contents.

And if you did use a cryptographically secure hash, it would still be
easy, unless there is some plausible way of keeping the key a secret,
which I doubt.

The original reason why we went with this design (send the snapshot to
the client and then have the client send it back to the server) was
because some people thought it could be useful to take a snapshot from
the master and recreate it on a standby, or perhaps the other way
around.  If that's the goal, checking whether the snapshot being
imported is one that's already in use is missing the point.  We have
to rely on our ability to detect, when importing the snapshot, any
anomalies that could lead to system instability; and allow people to
import an arbitrary snapshot that meets those constraints, even one
that the user just created out of whole cloth.

Now, as I said before, I think that's a bad idea.  I think we should
NOT be designing the system to allow publication of arbitrary
snapshots; instead, the backend that wishes to export its snapshot
should so request, getting back a token (like "1") that other backends
can then pass to START TRANSACTION (SNAPSHOT '1'), or whatever syntax
we end up using.  In that design, there's no need to validate
snapshots because they never leave the server's memory space, which
IMHO is as it should be.  If someone can develop a convincing argument
that transmitting snapshots between the master and standby is a good
idea, we can still accommodate that: the master-standby connection is
now full-duplex.  In fact, we may want to do that anyway for other
reasons (see Kevin Grittner's musings on this point vs. true
serializability).

Another reason I don't like this approach is because it's possible
that the representation of snapshots might change in the future.  Tom
mused a while back about using an LSN as a snapshot (snapshots sees
all transactions committed prior to that LSN) and there are other
plausible changes, too.  If we expose the internal details of the
snapshot mechanism to users, then (1) it becomes a user-visible
feature with backward compatibility implications if we choose to break
it down the road and (2) all future snapshot representations must be
able to be fully sanity checked on import.  Right now it's fairly easy
to verify that the snapshot's xmin isn't too old and that the
remaining XIDs are in a reasonable range and that's probably all we
really need to be certain of, but it's not clear that some future
representation would be equally easy to validate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Heikki Linnakangas

On 21.02.2011 21:33, Joachim Wieland wrote:

Hi,

On Mon, Feb 21, 2011 at 4:56 PM, Alvaro Herrera
  wrote:

What's the reason for this restriction?
if (databaseId != MyDatabaseId)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("cannot import snapshot from a different 
database")));


I just couldn't think of a use case for it and so didn't want to
introduce a feature that we might have to support in the future then.


Hmm, here's one: getting a consistent snapshot of all databases in 
pg_dumpall. Not sure how useful that is..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Heikki Linnakangas

On 19.02.2011 02:41, Joachim Wieland wrote:

On Fri, Feb 18, 2011 at 8:57 PM, Alvaro Herrera
  wrote:

1. why are you using the expansible char array stuff instead of using
the StringInfo facility?

2. is md5 the most appropriate digest for this?  If you need a
cryptographically secure hash, do we need something stronger?  If not,
why not just use hash_any?


We don't need a cryptographically secure hash.


Really? The idea of the hash is to prevent you from importing arbitrary 
snapshots into the system, allowing only copying snapshots that are in 
use by another backend. The purpose of the hash is to make sure the 
snapshot the client passes in is identical to one used by another backend.


If you don't use a cryptographically secure hash, it's easy to construct 
a snapshot with the same hash as an existing snapshot, with more or less 
arbitrary contents.


This also makes me worried:

+
+   /*
+* Verify that the checksum matches before doing any more work. We will
+* later verify again to make sure that the exporting transaction has 
not
+* yet terminated by then. We don't want to optimize this into just one
+* verification call at the very end because the instructions that 
follow
+* this comment rely on a sane format of the textual snapshot data. In
+* particular they assume that there are not more XactIds than
+* advertised...
+*/


It sounds like you risk a buffer overflow if the snapshot is bogus, 
which makes a collision-resistant hash even more important.


I know this was discussed already, but I don't much like using a hash 
like this. We should find a way to just store the whole snapshot in 
shared memory, or even a temporary file if we want to skimp on shared 
memory. It's generally better to not rely on cryptography when you don't 
have to.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Tom Lane
Alvaro Herrera  writes:
> That's true too.  So let's discuss the syntax.  Maybe
>   START TRANSACTION SNAPSHOT '\xdeadbeef';

> This kind of extension seems ugly though; maybe we should consider
>   START TRANSACTION (snapshot='\xdeadbeef');
> (like VACUUM, EXPLAIN and COPY) or some such?

+1 for the latter, mainly because (I think) you could avoid making
SNAPSHOT into an actual parser keyword that way.

> I don't think we need another "top-level" command for this.

Mmm ... one possible objection is that if it's hard-wired into the START
TRANSACTION syntax, then there is no way to take any locks before
establishing the snapshot.  Right offhand, though, I can't see a
use-case for doing that, given that the snapshot itself would be older
than any locks that the secondary transaction could grab.

I suppose that a separate top-level command might be nicer just to avoid
touching the syntax of START TRANSACTION.  If we put the option into
parens as above, there seems little chance of colliding with future spec
extensions, but it's a tad ugly looking.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Alvaro Herrera
Excerpts from Tom Lane's message of lun feb 21 21:00:19 -0300 2011:
> Alvaro Herrera  writes:
> > Actually this seems rather difficult to do, because in order to invoke
> > the function that imports the snapshot, you have to call SELECT, which
> > acquires a snapshot beforehand.  So when we actually import the
> > passed-in snapshot, there's already a snapshot set.  This is odious but 
> > I don't see a way around that -- other than adding special grammar
> > support which seems overkill.
> 
> No, I don't think it is.  The alternative is semantics that are
> at least exceedingly ugly, and very possibly flat-out broken.
> To take just one point, you have no way at all of preventing the
> transaction from having done something else using its original
> snapshot.

That's true too.  So let's discuss the syntax.  Maybe
START TRANSACTION SNAPSHOT '\xdeadbeef';

This kind of extension seems ugly though; maybe we should consider
START TRANSACTION (snapshot='\xdeadbeef');
(like VACUUM, EXPLAIN and COPY) or some such?


I don't think we need another "top-level" command for this.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Tom Lane
Alvaro Herrera  writes:
> Actually this seems rather difficult to do, because in order to invoke
> the function that imports the snapshot, you have to call SELECT, which
> acquires a snapshot beforehand.  So when we actually import the
> passed-in snapshot, there's already a snapshot set.  This is odious but 
> I don't see a way around that -- other than adding special grammar
> support which seems overkill.

No, I don't think it is.  The alternative is semantics that are
at least exceedingly ugly, and very possibly flat-out broken.
To take just one point, you have no way at all of preventing the
transaction from having done something else using its original
snapshot.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Alvaro Herrera
Excerpts from Kevin Grittner's message of lun feb 21 18:39:26 -0300 2011:
> Alvaro Herrera  wrote:
>  
> > I think we need a safety net so that the new serializable isolation
> > code doesn't get upset if we change the base snapshot from under
> > it, but I haven't looked at that yet.
>  
> Replacing the snapshot for a serializable transaction after it has
> acquired its initial snapshot would quietly allow non-serializable
> behavior, I would think.  I don't think that setting a snapshot for a
> SERIALIZABLE READ ONLY DEFERRABLE transaction makes any sense, since
> the point of that is that it waits for a snapshot which meets certain
> criteria to be available; setting a snapshot in that mode should
> probably just be disallowed.  Otherwise, if you set the snapshot
> before the transaction acquires one through normal means, I can't
> think of any problems -- just make sure you set FirstSnapshotSet.

Actually this seems rather difficult to do, because in order to invoke
the function that imports the snapshot, you have to call SELECT, which
acquires a snapshot beforehand.  So when we actually import the
passed-in snapshot, there's already a snapshot set.  This is odious but 
I don't see a way around that -- other than adding special grammar
support which seems overkill.

I've been thinking in requiring that snapshot to have refcount==1, but
I'm not sure if that works.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Kevin Grittner
Alvaro Herrera  wrote:
 
> I think we need a safety net so that the new serializable isolation
> code doesn't get upset if we change the base snapshot from under
> it, but I haven't looked at that yet.
 
Replacing the snapshot for a serializable transaction after it has
acquired its initial snapshot would quietly allow non-serializable
behavior, I would think.  I don't think that setting a snapshot for a
SERIALIZABLE READ ONLY DEFERRABLE transaction makes any sense, since
the point of that is that it waits for a snapshot which meets certain
criteria to be available; setting a snapshot in that mode should
probably just be disallowed.  Otherwise, if you set the snapshot
before the transaction acquires one through normal means, I can't
think of any problems -- just make sure you set FirstSnapshotSet.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Tom Lane
Joachim Wieland  writes:
> On Mon, Feb 21, 2011 at 4:56 PM, Alvaro Herrera
>  wrote:
>> Why are we using bytea as the output format instead of text?

> It is bytea because it should be thought of "just some data". It
> should be regarded more as a token than as text and should not be
> inspected/interpreted at all. If anybody decides to do so, fine, but
> then they should not rely on the stability of the message format for
> the future.

I don't think that passing it as bytea conveys those semantics at all.
It just makes it harder to look at the value for debugging purposes.

Plus you gotta worry about variable formatting/escaping conventions
(bytea_output etc) that could easily be avoided if the value were
promised to be text with no funny characters in it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Joachim Wieland
Hi,

On Mon, Feb 21, 2011 at 4:56 PM, Alvaro Herrera
 wrote:
> What's the reason for this restriction?
>        if (databaseId != MyDatabaseId)
>                ereport(ERROR,
>                        (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                         errmsg("cannot import snapshot from a different 
> database")));

I just couldn't think of a use case for it and so didn't want to
introduce a feature that we might have to support in the future then.


> Why are we using bytea as the output format instead of text?  The output
> is just plain text anyway, as can be seen by setting bytea_output to
> 'escape'.  Perhaps there would be a value to using bytea if we were to
> pglz_compress the result, but would there be a point in doing so?
> Normally exported info should be short enough that it would cause more
> overhead than it would save anyway.

It is bytea because it should be thought of "just some data". It
should be regarded more as a token than as text and should not be
inspected/interpreted at all. If anybody decides to do so, fine, but
then they should not rely on the stability of the message format for
the future.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of dom ene 30 14:36:12 -0300 2011:

> On Thu, Jan 20, 2011 at 1:37 AM, Noah Misch  wrote:

> >> > Is it valid to scribble directly on snapshots like this?
> >> I figured that previously executed code still has references pointing
> >> to the snapshots and so we cannot just push a new snapshot on top but
> >> really need to change the memory where they are pointing to.
> > Okay.  Had no special reason to believe otherwise, just didn't know.
> 
> This is one part where I'd like someone more experienced than me look into it.

Yeah, I don't like this bit very much.  I'm inclined to have the import
routine fill CurrentSnapshot directly (and perhaps ActiveSnapshot too,
not sure about this part yet) instead; I think we need a safety net so
that the new serializable isolation code doesn't get upset if we change
the base snapshot from under it, but I haven't looked at that yet.

Trying to import a snap into a transaction that has committed children
should also fail; otherwise, objects touched during those subxacts would
be in a spooky zone.  Also, I want to have Importing into a
read-committed transaction fail with an error instead of the current
warning -- just like LOCK TABLE fails if you're not inside a
transaction.  

I'm also inclined to have PREPARE TRANSACTION fail if it has an exported
snapshot, instead of just invalidating it.  This lets us get rid of the
extra locking added during that operation.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-21 Thread Alvaro Herrera
A couple more questions:

What's the reason for this restriction?
if (databaseId != MyDatabaseId)
ereport(ERROR,
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg("cannot import snapshot from a different 
database")));

Why are we using bytea as the output format instead of text?  The output
is just plain text anyway, as can be seen by setting bytea_output to
'escape'.  Perhaps there would be a value to using bytea if we were to
pglz_compress the result, but would there be a point in doing so?
Normally exported info should be short enough that it would cause more
overhead than it would save anyway.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Alvaro Herrera
Excerpts from Tom Lane's message of sáb feb 19 21:26:42 -0300 2011:

> However ... IIRC, hash_any gives different results on bigendian and
> littleendian machines.  I'm not sure if a predictable cross-platform
> result is important for this use?  If you're hashing data containing
> native integers, this is a problem anyway.

The current feature only lets you synchronize snapshots within a cluster
-- you can't ship them to another machine.  I don't think dependency on
endianness is a problem here.  (But no, the data doesn't contain native
integers -- it's the snapshot's text representation.)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Tom Lane
Peter Eisentraut  writes:
> On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote:
>> 2. is md5 the most appropriate digest for this?  If you need a
>> cryptographically secure hash, do we need something stronger?  If not,
>> why not just use hash_any?

> MD5 is probably more appropriate than hash_any, because the latter is
> optimized for speed and collision avoidance and doesn't have a
> guaranteed external format.  The only consideration against MD5 might be
> that it would make us look quite lame.

Only to people who don't understand whether crypto strength is actually
important in a given use-case.

However ... IIRC, hash_any gives different results on bigendian and
littleendian machines.  I'm not sure if a predictable cross-platform
result is important for this use?  If you're hashing data containing
native integers, this is a problem anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Joachim Wieland
On Sat, Feb 19, 2011 at 9:17 PM, Peter Eisentraut  wrote:
> The only consideration against MD5 might be
> that it would make us look quite lame.  We should probably provide
> builtin SHA1 and SHA2 functions for this and other reasons.

In this particular case however the checksum is never exposed to the
user but stays within shared memory. So nobody would notice that we
are still using lame old md5sum :-)


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-19 Thread Peter Eisentraut
On fre, 2011-02-18 at 16:57 -0300, Alvaro Herrera wrote:
> 2. is md5 the most appropriate digest for this?  If you need a
> cryptographically secure hash, do we need something stronger?  If not,
> why not just use hash_any?

MD5 is probably more appropriate than hash_any, because the latter is
optimized for speed and collision avoidance and doesn't have a
guaranteed external format.  The only consideration against MD5 might be
that it would make us look quite lame.  We should probably provide
builtin SHA1 and SHA2 functions for this and other reasons.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-18 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of vie feb 18 21:41:51 -0300 2011:
> On Fri, Feb 18, 2011 at 8:57 PM, Alvaro Herrera
>  wrote:
> > 1. why are you using the expansible char array stuff instead of using
> > the StringInfo facility?
> >
> > 2. is md5 the most appropriate digest for this?  If you need a
> > cryptographically secure hash, do we need something stronger?  If not,
> > why not just use hash_any?
> 
> We don't need a cryptographically secure hash.

Ok.

> Should I send an updated patch? Anything else?

No need, I'm halfway through already.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-18 Thread Joachim Wieland
On Fri, Feb 18, 2011 at 8:57 PM, Alvaro Herrera
 wrote:
> 1. why are you using the expansible char array stuff instead of using
> the StringInfo facility?
>
> 2. is md5 the most appropriate digest for this?  If you need a
> cryptographically secure hash, do we need something stronger?  If not,
> why not just use hash_any?

We don't need a cryptographically secure hash.

There is no special reason for why it is like it is, I just didn't
think of the better alternatives that you are proposing.

Should I send an updated patch? Anything else?


Thanks for the review,
Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-18 Thread Alvaro Herrera


I have two questions: 

1. why are you using the expansible char array stuff instead of using
the StringInfo facility?

2. is md5 the most appropriate digest for this?  If you need a
cryptographically secure hash, do we need something stronger?  If not,
why not just use hash_any?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-02-18 Thread Alvaro Herrera

Looking into this patch.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-01-30 Thread Noah Misch
On Sun, Jan 30, 2011 at 12:36:12PM -0500, Joachim Wieland wrote:
> Here is a new version of the patch incorporating most of Noah's
> suggestions. The patch now also adds documentation. Since I couldn't
> really find a suitable section to document the two new functions, I
> added a new one for now. Any better ideas where it should go?

No major opinion here.  The second best fit after a new section would be System
Administration Functions, inasmuch as that's the bucket for miscellaneous
functions.  I have an idea for improving structure here, but that would go
beyond the purview of this patch.

This version looks sound; I've marked it Ready for Committer.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-01-30 Thread Joachim Wieland
Here is a new version of the patch incorporating most of Noah's
suggestions. The patch now also adds documentation. Since I couldn't
really find a suitable section to document the two new functions, I
added a new one for now. Any better ideas where it should go?

On Thu, Jan 20, 2011 at 1:37 AM, Noah Misch  wrote:
> Just to clarify, I was envisioning something like:
>
> typedef struct { bool valid; char value[16]; } snapshotChksum;
>
> I don't object to the way you've done it, but someone else might not like the
> extra marshalling between text and binary.  Your call.

I didn't like it in the beginning but later on I adopted your
proposal. I have also changed the locking to be more natural. Even
though we don't really need it, I am now grabbing shared ProcArrayLock
for any reads of shared memory and exclusive lock for writes. Of
course no additional lock is taken if the feature is not used.


> You're right.  Then consider "VALUES (pg_import_snapshot('...'), (SELECT
> count(*) from t))" at READ COMMITTED.  It works roughly as I'd guess; the
> subquery uses the imported snapshot.  If I flip the two expressions and do
> "VALUES ((SELECT count(*) from t), pg_import_snapshot('...'))", the subquery
> uses the normal snapshot.  That makes sense, but I can't really see a use case
> for it.  A warning, like your code has today, seems appropriate.

Yeah, that would do what you wanted to illustrate but it truely cannot
be considered the standard use case  :-)


>> > Is it valid to scribble directly on snapshots like this?
>> I figured that previously executed code still has references pointing
>> to the snapshots and so we cannot just push a new snapshot on top but
>> really need to change the memory where they are pointing to.
> Okay.  Had no special reason to believe otherwise, just didn't know.

This is one part where I'd like someone more experienced than me look into it.


> Thanks; that was handy.  One thing I noticed is that the second "SELECT * FROM
> kidseen" yields zero rows instead of yielding "ERROR:  relation "kidseen" does
> not exist".  I changed things around per the attached "test-drop.pl", such 
> that
> the table is already gone in the ordinary snapshot, but still visible to the
> imported snapshot.  Note how the pg_class row is visible, but an actual 
> attempt
> to query the table fails.  Must some kind of syscache invalidation follow the
> snapshot alteration to make this behave normally?

As discussed with Noah off-list this is just not an MVCC safe
operation. You could hit on this in a regular SERIALIZABLE transaction
as well: Somebody else can delete a table and you won't be able to
access it anymore. This is also the reason why pg_dump in the
beginning gets a shared lock on every table that it will dump later
on.


Thanks for the review Noah,

Joachim
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index ed2039c..4169594 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** FOR EACH ROW EXECUTE PROCEDURE suppress_
*** 14761,14764 
--- 14761,14860 
  .
  

+ 
+   
+Snapshot Synchronization Functions
+ 
+
+  pg_export_snapshot
+
+
+  pg_import_snapshot
+
+ 
+
+  PostgreSQL allows different sessions to synchronize their
+  snapshots. A database snapshot determines which data is visible to
+  the client that is using this snapshot. Synchronized snapshots are necessary if
+  two clients need to see the same content in the database. If these two clients
+  just connected to the database and opened their transactions, then they could
+  never be sure that there was no data modification right between both
+  connections. To solve this, PostgreSQL offers the two functions
+  pg_export_snapshot and pg_import_snapshot.
+
+
+  The idea is that one client retrieves the information about the snapshot that it
+  is currently using and then the second client passes this information back to
+  the server. The database server will then modify the second client's snapshot
+  to match the snapshot of the first and from then on both can see the identical
+  data in the database.
+
+
+  Note that a snapshot can only be imported as long as the transaction that
+  exported it originally is held open. Also note that even after the
+  synchronization both clients still run their own independent transactions.
+  As a consequence, even though synchronized with respect to reading pre-existing
+  data, both transactions won't be able to see each other's uncommitted data.
+
+
+ Snapshot Synchronization Functions
+ 
+  
+   Name Return Type Description
+   
+  
+ 
+  
+   
+
+ pg_export_snapshot()
+
+bytea
+Export the snapshot and return its textual representation
+   
+   
+
+ pg_import_snapshot(snapshotData bytea)
+
+bool

Re: [HACKERS] Snapshot synchronization, again...

2011-01-19 Thread Noah Misch
On Wed, Jan 19, 2011 at 12:12:39AM -0500, Joachim Wieland wrote:
> Noah, thank you for this excellent review. I have taken over most
> (allmost all) of your suggestions (except for the documentation which
> is still missing). Please check the updated & attached patch for
> details.

Great.  I do get an assertion failure with this sequence:

BEGIN; SELECT pg_export_snapshot(); ROLLBACK;
SELECT 1;

TRAP: FailedAssertion("!(RegisteredSnapshots > 0)", File: "snapmgr.c", Line: 
430)

Perhaps some cleanup is missing from a ROLLBACK path?

> 
> On Fri, Jan 14, 2011 at 10:13 PM, Noah Misch  wrote:
> > "" is a valid md5 message digest. ?To avoid always 
> > accepting
> > a snapshot with that digest, we would need to separately store a flag 
> > indicating
> > whether a given syncSnapshotChksums member is currently valid. ?Maybe that 
> > hole
> > is acceptable, though.
> 
> In the end I decided to store md5 checksums as printable characters in
> shmem. We now need a few more bytes for each checksum but as soon as a
> byte is NUL we know that it is not a valid checksum.

Just to clarify, I was envisioning something like:

typedef struct { bool valid; char value[16]; } snapshotChksum;

I don't object to the way you've done it, but someone else might not like the
extra marshalling between text and binary.  Your call.

> >> + ? ? ?* Instead we must check to not go forward (we might have opened a 
> >> cursor
> >> + ? ? ?* in this transaction and still have its snapshot registered)
> >> + ? ? ?*/
> >
> > I'm thinking this case should yield an ERROR. ?Otherwise, our net result 
> > would
> > be to silently adopt a snapshot that is neither our old self nor the target.
> > Maybe there's a use case for that, but none comes to my mind.
> 
> This can happen when you do:
> 
> DECLARE foo CURSOR FOR SELECT * FROM bar;
> import snapshot...
> FETCH 1 FROM foo;

I see now.  You set snapshot->xmin unconditionally, but never move MyProc->xmin
forward, only backward.  Yes; that seems just right.

> > I guess the use case for pg_import_snapshot from READ COMMITTED would be
> > something like "DO $$BEGIN PERFORM pg_import_snapshot('...'); stuff; 
> > END$$;".
> > First off, would that work ("stuff" use the imported snapshot)? ?If it does
> > work, we should take the precedent of SET LOCAL and issue no warning. ?If it
> > doesn't work, then we should document that the snapshot does take effect 
> > until
> > the next command and probably keep this warning or something similar.
> 
> No, this will also give you a new snapshot for every query, no matter
> if it is executed within or outside of a DO-Block.

You're right.  Then consider "VALUES (pg_import_snapshot('...'), (SELECT
count(*) from t))" at READ COMMITTED.  It works roughly as I'd guess; the
subquery uses the imported snapshot.  If I flip the two expressions and do
"VALUES ((SELECT count(*) from t), pg_import_snapshot('...'))", the subquery
uses the normal snapshot.  That makes sense, but I can't really see a use case
for it.  A warning, like your code has today, seems appropriate.

> > Is it valid to scribble directly on snapshots like this?
> 
> I figured that previously executed code still has references pointing
> to the snapshots and so we cannot just push a new snapshot on top but
> really need to change the memory where they are pointing to.

Okay.  Had no special reason to believe otherwise, just didn't know.

> I am also adding a test script that shows the difference of READ
> COMMITTED and SERIALIZABLE in an importing transaction in a DO-Block.
> It's based on the script you sent.

Thanks; that was handy.  One thing I noticed is that the second "SELECT * FROM
kidseen" yields zero rows instead of yielding "ERROR:  relation "kidseen" does
not exist".  I changed things around per the attached "test-drop.pl", such that
the table is already gone in the ordinary snapshot, but still visible to the
imported snapshot.  Note how the pg_class row is visible, but an actual attempt
to query the table fails.  Must some kind of syscache invalidation follow the
snapshot alteration to make this behave normally?

General tests involving only DML seem to do the right thing.  Subtransaction
handling looks sound.  Patch runs cleanly according to Valgrind.

> So thanks again and I'm looking forward to your next review... :-)

Hope this one helps, too.

> diff --git a/src/backend/storage/ipc/procarray.c 
> b/src/backend/storage/ipc/procarray.c
> index 8b36df4..29c9426 100644

> + bytea *
> + ExportSnapshot(Snapshot snapshot)
> + {
> + int bufsize = 1024;
> + int bufsize_filled = 0; /* doesn't include 
> NUL byte */
> + int bufsize_left = bufsize - bufsize_filled;
> + char   *buf = (char *) palloc(bufsize);
> + int i;
> + TransactionId  *children;
> + int nchildren;
> + 
> + /* In a subtransact

Re: [HACKERS] Snapshot synchronization, again...

2011-01-18 Thread Joachim Wieland
Noah, thank you for this excellent review. I have taken over most
(allmost all) of your suggestions (except for the documentation which
is still missing). Please check the updated & attached patch for
details.

On Fri, Jan 14, 2011 at 10:13 PM, Noah Misch  wrote:
> "" is a valid md5 message digest.  To avoid always 
> accepting
> a snapshot with that digest, we would need to separately store a flag 
> indicating
> whether a given syncSnapshotChksums member is currently valid.  Maybe that 
> hole
> is acceptable, though.

In the end I decided to store md5 checksums as printable characters in
shmem. We now need a few more bytes for each checksum but as soon as a
byte is NUL we know that it is not a valid checksum.

>> +
>> +     if (!XactReadOnly)
>> +             elog(WARNING, "A snapshot exporting function should be 
>> readonly.");
>
> There are legitimate use cases for copying the snapshot of a read-write
> transaction.  Consider a task like calculating some summaries for insert into 
> a
> table.  To speed this up, you might notionally partition the source data and
> have each of several workers calculate each partition.  I'd vote for removing
> this warning entirely.

Warning removed, adding this fact to the documentation only is fine with me.


> InvalidBackendId is -1.  Is InvalidOid (0) in fact also invalid as a backend 
> ID?

Yes, there is a check in utils/init/postinit.c that makes sure that 0
is never a valid backendId. But anyway, I have now made this more
explicit by adding an own parse routine for ints.


>> +     while ((xid = parseXactFromText(&s, "xip:")) != InvalidTransactionId)
>> +     {
>> +             if (xid == GetTopTransactionIdIfAny())
>> +                     continue;
>> +             snapshot->xip[i++] = xid;
>
> This works on a virgin GetSnapshotData() snapshot, which always has enough 
> space
> (sized according to max_connections).  A snapshot from CopySnapshot(), 
> however,
> has just enough space for the original usage.  I get a SIGSEGV at this line 
> with
> the attached test script.  If I change things around so xip starts non-NULL, I
> get messages like these as the code scribbles past the end of the allocation:

This has been fixed. xip/subxip are now allocated separately if necessary.


> Though unlikely, the other backend may not even exist by now.  Indeed, that
> could have happened and RecentGlobalXmin advanced since we compared the
> checksum.  Not sure what the right locking is here, but something is needed.

Good catch. What I have done now is a second check at the end of
ImportSnapshot(). At that point we have set up the new snapshot and
adjusted MyProc->xmin. If we succeed, then we are fine. If not we
throw an error and invalidate the whole transaction.


>> +      * Instead we must check to not go forward (we might have opened a 
>> cursor
>> +      * in this transaction and still have its snapshot registered)
>> +      */
>
> I'm thinking this case should yield an ERROR.  Otherwise, our net result would
> be to silently adopt a snapshot that is neither our old self nor the target.
> Maybe there's a use case for that, but none comes to my mind.

This can happen when you do:

DECLARE foo CURSOR FOR SELECT * FROM bar;
import snapshot...
FETCH 1 FROM foo;


> I guess the use case for pg_import_snapshot from READ COMMITTED would be
> something like "DO $$BEGIN PERFORM pg_import_snapshot('...'); stuff; END$$;".
> First off, would that work ("stuff" use the imported snapshot)?  If it does
> work, we should take the precedent of SET LOCAL and issue no warning.  If it
> doesn't work, then we should document that the snapshot does take effect until
> the next command and probably keep this warning or something similar.

No, this will also give you a new snapshot for every query, no matter
if it is executed within or outside of a DO-Block.


>> + Datum
>> + pg_import_snapshot(PG_FUNCTION_ARGS)
>> + {
>> +     bytea  *snapshotData = PG_GETARG_BYTEA_P(0);
>> +     bool    ret = true;
>> +
>> +     if (ActiveSnapshotSet())
>> +             ret = ImportSnapshot(snapshotData, GetActiveSnapshot());
>> +
>> +     ret &= ImportSnapshot(snapshotData, GetTransactionSnapshot());
>
> Is it valid to scribble directly on snapshots like this?

I figured that previously executed code still has references pointing
to the snapshots and so we cannot just push a new snapshot on top but
really need to change the memory where they are pointing to.


I am also adding a test script that shows the difference of READ
COMMITTED and SERIALIZABLE in an importing transaction in a DO-Block.
It's based on the script you sent.


So thanks again and I'm looking forward to your next review... :-)

Joachim
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 2dbac56..c96942a 100644
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
*** CreateSharedMemoryAndSemaphores(bool mak
*** 124,129 
--- 124,130 
  		size 

Re: [HACKERS] Snapshot synchronization, again...

2011-01-14 Thread Noah Misch
Hello Joachim,

I'm reviewing this patch for CommitFest 2011-01.

On Fri, Jan 07, 2011 at 06:41:38AM -0500, Joachim Wieland wrote:
> On Thu, Dec 30, 2010 at 7:31 AM, Joachim Wieland  wrote:
> > What I am proposing now is the following:
> >
> > We return snapshot information as a chunk of data to the client. At
> > the same time however, we set a checksum in shared memory to protect
> > against modification of the snapshot. A publishing backend can revoke
> > its snapshot by deleting the checksum and a backend that is asked to
> > install a snapshot can verify that the snapshot is correct and current
> > by calculating the checksum and comparing it with the one in shared
> > memory.

I like this design.  It makes the backend's use of resources to track exported
snapshots very simple.  It also avoids pessimizing the chances that we can, at
some point, synchronize snapshots across various standby clusters without
changing the user-facing API.

> 
> So here's the patch implementing this idea.
> 
> I named the user visible functions pg_export_snapshot() and
> pg_import_snapshot().
> 
> A user starts a transaction and calls pg_export_snapshot() to get a
> chunk of bytea data. In a different connection he opens a transaction in
> isolation level serializable and passes that chunk of data into
> pg_import_snapshot(). Now subsequent queries of the second connection see the
> snapshot that was current when pg_export_snapshot() has been executed. In case
> both transactions are in isolation level serializable then both see the same
> data from this moment on (this is the case of pg_dump).
> 
> There are most probably a few loose ends and someone who is more familiar to
> this area than me needs to look into it but at least everybody should be happy
> now with the overall approach.
> 
> These are the implementation details and restrictions of the patch:
> 
> The exporting transaction
> 
> - should be read-only (to discourage people from expecting that writes of
>   the exporting transaction can be seen by the importing transaction)
> - must not be a subtransaction (we don't add subxips of our own 
> transaction
>   to the snapshot, so importing the snapshot later would result in missing
>   subxips)
> - adds its own xid (if any) to the xip-array
> 
> The importing transaction
> 
> - will not import a snapshot of the same backend (even though it would
>   probably work)
> - will not import a snapshot of a different database in the cluster
> - should be isolation level serializable
> - must not be a subtransaction (can we completely rollback on
> subxact-rollback?)

I don't know, but this restriction does not seem onerous.

> - leaves curcid as is, otherwise effects of previous commands would get 
> lost
>   and magically appear later when curcid increases
> - applies xmin, xmax, xip, subxip values of the exported snapshot to
>   GetActiveSnapshot() and GetTransactionSnapshot()
> - takes itself out of the xip array
> - updates MyProc->xmin but sets it only backwards but not forwards
> 
> The snapshot is invalidated on transaction commit/rollback as well as when 
> doing
> "prepare transaction".
> 
> Comments welcome.

General points:

Basics: patch has the correct format and applies cleanly (offset in pg_proc.h).
It includes no tests, but perhaps it should not.  Really testing this requires
concurrency, and our test framework is not set up for that.  We could add some
cheap tests to ensure the functions fail cleanly in some basic situations, but
that would be about it.  The patch does not, but should, document the new
functions.  The new functions also need comments preceding their definitions,
similar to those found on all other functions in procarray.c.

You add four elog() calls, all of which should be ereport() to facilitate
translation.  Also consider suitable error codes.  The error messages do not
follow style; in particular, they begin with capitals.  See the style guide,
http://www.postgresql.org/docs/current/interactive/error-style-guide.html.

ImportSnapshot() has no paths that justify returning false.  It should always
return true or throw a suitable error.

See below for various specific comments.

> diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
> index 95beba8..c24150f 100644
> *** a/src/backend/storage/ipc/ipci.c
> --- b/src/backend/storage/ipc/ipci.c
> *** CreateSharedMemoryAndSemaphores(bool mak
> *** 124,129 
> --- 124,130 
>   size = add_size(size, BTreeShmemSize());
>   size = add_size(size, SyncScanShmemSize());
>   size = add_size(size, AsyncShmemSize());
> + size = add_size(size, SyncSnapshotShmemSize());
>   #ifdef EXEC_BACKEND
>   size = add_size(size, ShmemBackendArraySize());
>   #endif
> *** CreateSharedMemoryAndSemaphores(bool mak
> *** 228,233 
> --- 229,235 
>   BTreeShmemInit();
>   Sy

Re: [HACKERS] Snapshot synchronization, again...

2011-01-07 Thread Florian Pflug
On Jan7, 2011, at 12:41 , Joachim Wieland wrote:
> On Thu, Dec 30, 2010 at 7:31 AM, Joachim Wieland  wrote:
> These are the implementation details and restrictions of the patch:
> 
> The exporting transaction
> 
>- should be read-only (to discourage people from expecting that writes of
>  the exporting transaction can be seen by the importing transaction)

That seems overly strict. Wouldn't a warning about this in the docs suffice?

>- must not be a subtransaction (we don't add subxips of our own transaction
>  to the snapshot, so importing the snapshot later would result in missing
>  subxips)

If that is necessary, wouldn't a sub-transaction that was sub-committed before
exporting the snapshot also pose a problem? The snapshot wouldn't include
the sub-transaction's xid, so once the exporting transaction has committed any
importing transaction would suddenly see the sub-transaction's changes.

>- adds its own xid (if any) to the xip-array

Per the previous paragraph, I believe it should also add the xid's of all
non-aborted sub-transactions to the subxip-array, overflowing the array if
necessary.

> The importing transaction
> 
>- will not import a snapshot of the same backend (even though it would
>  probably work)
>- will not import a snapshot of a different database in the cluster
>- should be isolation level serializable
>- must not be a subtransaction (can we completely rollback on
> subxact-rollback?)

Sounds fine so far.

>- leaves curcid as is, otherwise effects of previous commands would get 
> lost
>  and magically appear later when curcid increases
>- applies xmin, xmax, xip, subxip values of the exported snapshot to
>  GetActiveSnapshot() and GetTransactionSnapshot()
>- takes itself out of the xip array
>- updates MyProc->xmin but sets it only backwards but not forwards

Change a snapshot mid-transaction like that seems risky. I'd suggest to simply
decree that an importing transaction must be read-only and must not yet have set
its snapshot. So effectively, the only allowed calling sequence would be

begin;
set transaction isolation level serializable read only;
pg_import_snapshot(...);

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2011-01-07 Thread Joachim Wieland
On Thu, Dec 30, 2010 at 7:31 AM, Joachim Wieland  wrote:
> What I am proposing now is the following:
>
> We return snapshot information as a chunk of data to the client. At
> the same time however, we set a checksum in shared memory to protect
> against modification of the snapshot. A publishing backend can revoke
> its snapshot by deleting the checksum and a backend that is asked to
> install a snapshot can verify that the snapshot is correct and current
> by calculating the checksum and comparing it with the one in shared
> memory.

So here's the patch implementing this idea.

I named the user visible functions pg_export_snapshot() and
pg_import_snapshot().

A user starts a transaction and calls pg_export_snapshot() to get a
chunk of bytea data. In a different connection he opens a transaction in
isolation level serializable and passes that chunk of data into
pg_import_snapshot(). Now subsequent queries of the second connection see the
snapshot that was current when pg_export_snapshot() has been executed. In case
both transactions are in isolation level serializable then both see the same
data from this moment on (this is the case of pg_dump).

There are most probably a few loose ends and someone who is more familiar to
this area than me needs to look into it but at least everybody should be happy
now with the overall approach.

These are the implementation details and restrictions of the patch:

The exporting transaction

- should be read-only (to discourage people from expecting that writes of
  the exporting transaction can be seen by the importing transaction)
- must not be a subtransaction (we don't add subxips of our own transaction
  to the snapshot, so importing the snapshot later would result in missing
  subxips)
- adds its own xid (if any) to the xip-array

The importing transaction

- will not import a snapshot of the same backend (even though it would
  probably work)
- will not import a snapshot of a different database in the cluster
- should be isolation level serializable
- must not be a subtransaction (can we completely rollback on
subxact-rollback?)
- leaves curcid as is, otherwise effects of previous commands would get lost
  and magically appear later when curcid increases
- applies xmin, xmax, xip, subxip values of the exported snapshot to
  GetActiveSnapshot() and GetTransactionSnapshot()
- takes itself out of the xip array
- updates MyProc->xmin but sets it only backwards but not forwards

The snapshot is invalidated on transaction commit/rollback as well as when doing
"prepare transaction".

Comments welcome.


Joachim
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 95beba8..c24150f 100644
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
*** CreateSharedMemoryAndSemaphores(bool mak
*** 124,129 
--- 124,130 
  		size = add_size(size, BTreeShmemSize());
  		size = add_size(size, SyncScanShmemSize());
  		size = add_size(size, AsyncShmemSize());
+ 		size = add_size(size, SyncSnapshotShmemSize());
  #ifdef EXEC_BACKEND
  		size = add_size(size, ShmemBackendArraySize());
  #endif
*** CreateSharedMemoryAndSemaphores(bool mak
*** 228,233 
--- 229,235 
  	BTreeShmemInit();
  	SyncScanShmemInit();
  	AsyncShmemInit();
+ 	SyncSnapshotInit();
  
  #ifdef EXEC_BACKEND
  
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 980996e..e851bcd 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 50,60 
--- 50,62 
  #include "access/transam.h"
  #include "access/xact.h"
  #include "access/twophase.h"
+ #include "libpq/md5.h"
  #include "miscadmin.h"
  #include "storage/procarray.h"
  #include "storage/spin.h"
  #include "storage/standby.h"
  #include "utils/builtins.h"
+ #include "utils/bytea.h"
  #include "utils/snapmgr.h"
  
  
*** static int KnownAssignedXidsGetAndSetXmi
*** 159,164 
--- 161,170 
  static TransactionId KnownAssignedXidsGetOldestXmin(void);
  static void KnownAssignedXidsDisplay(int trace_level);
  
+ typedef char snapshotChksum[16];
+ static snapshotChksum *syncSnapshotChksums;
+ static Snapshot exportedSnapshot;
+ 
  /*
   * Report shared-memory space needed by CreateSharedProcArray.
   */
*** KnownAssignedXidsDisplay(int trace_level
*** 3065,3067 
--- 3071,3335 
  
  	pfree(buf.data);
  }
+ 
+ 
+ /*
+  *  Report space needed for our shared memory area, which is basically an
+  *  md5 checksum per connection.
+  */
+ Size
+ SyncSnapshotShmemSize(void)
+ {
+ 	return PROCARRAY_MAXPROCS * sizeof(snapshotChksum);
+ }
+ 
+ void
+ SyncSnapshotInit(void)
+ {
+ 	Size	size;
+ 	bool	found;
+ 	int		i;
+ 
+ 	size = SyncSnapshotShmemSize();
+ 
+ 	syncSnapshotChksums = (snapshotChksum*) ShmemInitStruct("SyncSnapshotChksums",
+ 			size, &found);
+ 	if (!found)
+ 		for (i 

Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Joachim Wieland
On Fri, Dec 31, 2010 at 8:28 AM, Alvaro Herrera
 wrote:
> A backend can have any number of snapshots registered, and those don't
> allow GlobalXmin to advance.  Consider an open cursor, for example.
> Even if the rest of the transaction is read committed, the snapshot
> registered by the open cursor still holds back GlobalXmin.  My
> (handwavy) idea is that whenever the transaction calls
> pg_publish_snapshot(), said snapshot is registered, which makes it safe
> to use even if the transaction continues to operate and obtain newer
> snapshots.

Cool, even better that this is taken care of already :-)

Thanks,
Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of vie dic 31 00:15:57 -0300 2010:
> On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera
>  wrote:
> >> Disadvantage of b: It doesn't allow a snapshot to be installed on a
> >> different server. It requires a serializable open transaction to hold
> >> the snapshot.
> >
> > Why does it require a serializable transaction?  You could simply
> > register the snapshot in any transaction.  (Of course, the net effect
> > would be pretty similar to a serializable transaction).
> 
> I am not assuming that the publishing transaction blocks until its
> snapshot is being picked up. A read committed transaction would get a
> new snapshot for every other query, so the published snapshot is no
> longer represented by an actual backend until it is being picked up by
> one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum
> would remove tuples that the published-but-not-yet-picked-up snapshot
> should still be able to see, no?

A backend can have any number of snapshots registered, and those don't
allow GlobalXmin to advance.  Consider an open cursor, for example.
Even if the rest of the transaction is read committed, the snapshot
registered by the open cursor still holds back GlobalXmin.  My
(handwavy) idea is that whenever the transaction calls
pg_publish_snapshot(), said snapshot is registered, which makes it safe
to use even if the transaction continues to operate and obtain newer
snapshots.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-31 Thread Stefan Kaltenbrunner

On 12/30/2010 10:45 PM, Heikki Linnakangas wrote:

On 30.12.2010 16:49, Florian Pflug wrote:

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.


We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?


Hmm, our definition of "read-only" is a bit fuzzy. While a transaction
doesn't modify the database itself, it could still send NOTIFYs or call
a PL function to do all sorts of things outside the database. Imagine
that you're paranoid about data integrity, and have a security definer
function that runs cross checks on the data. If it finds any
anomalities, it wakes up the operator or forces shutdown or similar.


are people actually doing that in reality? I'm also having a hard time 
picturing a realistic example of what that "data integrity check" 
function would actually being able to check with default isolation mode 
and concurrent activity...




Now a malicious user could set a snapshot that passes the basic validity
checks, ie. xmin >= GlobalXmin, but contains a combination of still
in-progress that never existed in reality. If he then calls the
paranoia-function, it would see an inconsistent state of committed
tuples and get upset.


sure but I would expect being able to set a snapshot requiring either 
superuser or some sort of "WITH SNAPSHOT" grant thingy - and in general 
there are much more trivial and not that obscure scenarios a "normal" 
user can cause the admin to get paged :)




Maybe that's a bit far-stretched, but it's not entirely clear that
running with an inconsistent snapshot is harmless.


well there has been some discussion with to the SSI stuff that we might 
want to reconsider our definition of "read-only" maybe that would be the 
right way to approach the problem?



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
On Thu, Dec 30, 2010 at 9:49 AM, Florian Pflug  wrote:
> On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
>> We return snapshot information as a chunk of data to the client. At
>> the same time however, we set a checksum in shared memory to protect
>> against modification of the snapshot. A publishing backend can revoke
>> its snapshot by deleting the checksum and a backend that is asked to
>> install a snapshot can verify that the snapshot is correct and current
>> by calculating the checksum and comparing it with the one in shared
>> memory.
>
> We'd still have to stream these checksums to the standbys though,
> or would they be exempt from the checksum checks?

I am not talking about having synchronized snapshots among standby
servers at all.

I am only proposing a client API that will work for this future idea as well.


> I still wonder whether these checks are worth the complexity. I
> believe we'd only allow snapshot modifications for read-only queries
> anyway, so what point is there in preventing clients from setting
> broken snapshots?

What's the use case for it? As soon as nobody comes up with a
reasonable use case for it, let's aim for the robust version.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
On Thu, Dec 30, 2010 at 9:40 AM, Alvaro Herrera
 wrote:
>> Disadvantage of b: It doesn't allow a snapshot to be installed on a
>> different server. It requires a serializable open transaction to hold
>> the snapshot.
>
> Why does it require a serializable transaction?  You could simply
> register the snapshot in any transaction.  (Of course, the net effect
> would be pretty similar to a serializable transaction).

I am not assuming that the publishing transaction blocks until its
snapshot is being picked up. A read committed transaction would get a
new snapshot for every other query, so the published snapshot is no
longer represented by an actual backend until it is being picked up by
one. Since nobody is holding off xmin/GlobalXmin, eventually vacuum
would remove tuples that the published-but-not-yet-picked-up snapshot
should still be able to see, no?

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Heikki Linnakangas

On 30.12.2010 16:49, Florian Pflug wrote:

On Dec30, 2010, at 13:31 , Joachim Wieland wrote:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.


We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?


Hmm, our definition of "read-only" is a bit fuzzy. While a transaction 
doesn't modify the database itself, it could still send NOTIFYs or call 
a PL function to do all sorts of things outside the database. Imagine 
that you're paranoid about data integrity, and have a security definer 
function that runs cross checks on the data. If it finds any 
anomalities, it wakes up the operator or forces shutdown or similar.


Now a malicious user could set a snapshot that passes the basic validity 
checks, ie. xmin >= GlobalXmin, but contains a combination of still 
in-progress that never existed in reality. If he then calls the 
paranoia-function, it would see an inconsistent state of committed 
tuples and get upset.


Maybe that's a bit far-stretched, but it's not entirely clear that 
running with an inconsistent snapshot is harmless.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Florian Pflug
On Dec30, 2010, at 13:31 , Joachim Wieland wrote:
> We return snapshot information as a chunk of data to the client. At
> the same time however, we set a checksum in shared memory to protect
> against modification of the snapshot. A publishing backend can revoke
> its snapshot by deleting the checksum and a backend that is asked to
> install a snapshot can verify that the snapshot is correct and current
> by calculating the checksum and comparing it with the one in shared
> memory.

We'd still have to stream these checksums to the standbys though,
or would they be exempt from the checksum checks?

I still wonder whether these checks are worth the complexity. I
believe we'd only allow snapshot modifications for read-only queries
anyway, so what point is there in preventing clients from setting
broken snapshots?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Alvaro Herrera
Excerpts from Joachim Wieland's message of jue dic 30 09:31:47 -0300 2010:

> Advantage of b: No validation necessary, as soon as the transaction
> that publishes the snapshot loses that snapshot, it will also revoke
> the snapshot information (either by removing a temp file or deleting
> it from shared memory)
> Disadvantage of b: It doesn't allow a snapshot to be installed on a
> different server. It requires a serializable open transaction to hold
> the snapshot.

Why does it require a serializable transaction?  You could simply
register the snapshot in any transaction.  (Of course, the net effect
would be pretty similar to a serializable transaction).

> We return snapshot information as a chunk of data to the client. At
> the same time however, we set a checksum in shared memory to protect
> against modification of the snapshot. A publishing backend can revoke
> its snapshot by deleting the checksum and a backend that is asked to
> install a snapshot can verify that the snapshot is correct and current
> by calculating the checksum and comparing it with the one in shared
> memory.
> 
> This only costs us a few bytes for the checksum * max_connection in
> shared memory and apart from resetting the checksum it does not have
> cleanup and verification issues.

So one registered snapshot per transaction?  Sounds a reasonable
limitation (I doubt there's a use case for more than that, anyway).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Snapshot synchronization, again...

2010-12-30 Thread Joachim Wieland
The snapshot synchronization discussion from the parallel pg_dump
patch somehow ended without a clear way to go forward.

Let me sum up what has been brought up and propose a short- and
longterm solution.

Summary:

Passing snapshot sync information can be done either:

a) by returning complete snapshot information from the backend to the
client so that the client can pass it along to a different backend
b) or by returning only a unique identifier to the client and storing
the actual snapshot data somewhere on the server side

Advantage of a: no memory is used in the backend and no memory needs
to get cleaned up, it is also theoretically possible that we could
forward that data to a hot standby server and do e.g. a dump partially
on the master server and partially on the hot standby server or among
several hot standby servers.
Disadvantage of a: The snapshot must be validated to make sure that
its information is still current, it might be difficult to cover all
cases of this validation. A client can not only access exactly a
published snapshot, but just about any snapshot that fits and passes
the validation checks (this is more a disadvantage than an advantage
because it allows to see a database state that never existed in
reality).

Advantage of b: No validation necessary, as soon as the transaction
that publishes the snapshot loses that snapshot, it will also revoke
the snapshot information (either by removing a temp file or deleting
it from shared memory)
Disadvantage of b: It doesn't allow a snapshot to be installed on a
different server. It requires a serializable open transaction to hold
the snapshot.

What I am proposing now is the following:

We return snapshot information as a chunk of data to the client. At
the same time however, we set a checksum in shared memory to protect
against modification of the snapshot. A publishing backend can revoke
its snapshot by deleting the checksum and a backend that is asked to
install a snapshot can verify that the snapshot is correct and current
by calculating the checksum and comparing it with the one in shared
memory.

This only costs us a few bytes for the checksum * max_connection in
shared memory and apart from resetting the checksum it does not have
cleanup and verification issues. Note that we are also free to change
the internal format of the chunk of data we return whenever we like,
so we are free to enhance this feature in the future, transparently to
the client.


Thoughts?


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers