date:20110120

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Dan Ports

On Fri, Jan 21, 2011 at 08:44:59AM +0200, Heikki Linnakangas wrote:
> We have enough information in the standby to reconstruct all writes done 
> in the master. I gather that's not enough, in order to roll back 
> read-only transaction T3 on the standby which would see an anomaly, we'd 
> also need to know what reads T1 and T2 did in the master. Is that correct?

That's some of the information we need, but it's not enough...

The problem is that the conflict might not be discovered until after T3
(the reader) commits. In that case, it's too late to abort T3, so you'd
need to roll back T2 instead. But that means a read-only transaction on
the slave has to be able to cause a concurrent read-write transaction
on the master to abort, which brings with it no end of problems.

To make that a little more concrete, let me borrow Kevin's favorite
batch processing example...

 [master] T2: BEGIN
 [master] T2: SELECT FROM control
 [master] T1: BEGIN
 [master] T1: UPDATE control
 [master] T1: COMMIT
  [slave] T3: BEGIN
  [slave] T3: SELECT FROM control, receipt
  [slave] T3: COMMIT
 [master] T2: INSERT INTO receipt
 [master] T2: COMMIT

If this all happened at the master, T2 would get rolled back when it
tries to do its INSERT. (I just tried it.) But if T3 happened on the
slave, the master doesn't know that it read both tables, nor does the
slave know at the time it's executing T3 that it's going to conflict
with T2. 

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Heikki Linnakangas


On 21.01.2011 03:19, Dan Ports wrote:

What I'm still not clear on is why that HS is different. Whatever rules
apply on the master must also apply on the standby, immutably. Why is it
we need to pass explicit snapshot information from master to standby? We
don't do that, except at startup for normal HS. Why do we need that?

I hear, but do not yet understand, that the SSI transaction sequence on
the master may differ from the WAL transaction sequence. Is it important
that the ordering on the master would differ from the standby?


The logical serializable ordering of transactions in SSI doesn't
necessarily match the commit time ordering (i.e. the WAL sequence). For
example, with two concurrent transactions, T1 might commit after T2,
even though it didn't see the changes made by T2 and thus has to be
considered "earlier".

It doesn't matter whether T1 committed before T2 or the other way
around, as long as no other transaction can tell the difference. If
someone saw the changes made by T1 but not those made by T2, they'd see
T2 as happening before T1, violating serializability. Our SSI code
ensures that doesn't happen by tracking read dependencies. If it
detects that such a read is happening, it rolls back one of the
transactions involved.

Now, if we extend this to hot standby, if T2 commits before T1 on the
master, it obviously will on the slave too. A transaction run on the
slave at the right time might be able to see that T2 has happened but
not T1, which is unserializable. If that transaction had ben run on the
master, then it would have been detected and something would have been
rolled back, but the master has no way to know what data is being read
on the slave.


We have enough information in the standby to reconstruct all writes done 
in the master. I gather that's not enough, in order to roll back 
read-only transaction T3 on the standby which would see an anomaly, we'd 
also need to know what reads T1 and T2 did in the master. Is that correct?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Fujii Masao

On Fri, Jan 21, 2011 at 1:00 AM, Tom Lane  wrote:
> Fujii Masao  writes:
>> On Thu, Jan 20, 2011 at 10:53 AM, Tom Lane  wrote:
 I'm not sure why that's the right solution. Why do you think that we should
 not create the tablespace under the $PGDATA directory? I'm not surprised
 that people mounts the filesystem on $PGDATA/mnt and creates the
 tablespace on it.
>
>>> No?  Usually, having a mount point in a non-root-owned directory is
>>> considered a Bad Thing.
>
>> Hmm.. but ISTM we can have a root-owned mount point in $PGDATA
>> and create a tablespace there.
>
> Nonsense.  The more general statement is that it's a security hole
> unless the mount point *and everything above it* is root owned.

Probably true. But we cannot create a tablespace for root-owned directory.
The directory must be owned by the PostgreSQL system user. So ISTM that
you says that creating a tablespace on a mount point itself is a security hole.

> In the case you sketch, there would be nothing to stop the (non root)
> postgres user from renaming $PGDATA/mnt to something else and then
> inserting his own trojan-horse directories.

Hmm.. can non-root postgres user really rename the root-owned directory
while it's being mounted?

> Moreover, I see no positive *good* reason to do it.  There isn't
> anyplace under $PGDATA that users should be randomly creating
> directories, much less mount points.

When taking a base backup, you don't need to take a backup of tablespaces
separately from that of $PGDATA. You have only to take a backup of $PGDATA.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transaction-scope advisory locks

2011-01-20 Thread Marko Tiikkaja


On 1/20/2011 7:35 AM, Tom Lane wrote:

Marko Tiikkaja  writes:

This seems useful, since the xact lock would be automatically released
if an error happens during "-- do something here" so you wouldn't need
to worry about releasing the lock elsewhere.  But I'm not sure this is
safe.  Can anyone see a problem with it?


I think the POLA dictates that the behavior of that should be that you
now have both a transactional and a nontransactional hold on the lock;
and only the transactional hold goes away at commit.


Yes, I believe that's what happens now.  But I guess you answered my 
question too by not pointing out a huge flaw in that thinking.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

Robert Haas  wrote:
> Kevin Grittner  wrote:
>> As I mentioned in another email, we might want to throttle this.
>> My thinking was that we could start a timer on capturing a
>> snapshot, and continue to gather new ones as they become
>> available. When you hit the timer limit (maybe 100ms?) you send
>> the latest snapshot, if you have a new one; otherwise you keep
>> trying and send one as soon as you get it.
>
> I think this is likely to suck. That's introducing 10 not-small
> XLOG records per second just in case someone happens to try to
> start a serializable transaction on a standby server.

That depends on whether we can pull off the idea for not sending the
snapshot itself which I mentioned.  But that idea is pretty sketchy
at the moment.  I can't swear we can make that work, but if we can,
it should use a lot less WAL space.

> A possibly-viable alternative would be to build something into the
> SR protocol to allow the standby to request a workable snapshot
> from the master, and the master to send it (out-of-band with
> respect to the WAL stream) when so requested.

If we can make that work, that has advantages.

> it seems like there could be starvation problems - is there an
> upper bound on the length of time it would take the master to
> generate a safe snapshot for the standby to use?

Unfortunately, to get a safe snapshot you need to grab a candidate
snapshot and wait for all serializable read write transactions which
were active at the time to complete.  At a minimum.  If any of them
develop the wrong pattern of rw-dependencies you have to discard it,
grab a new snapshot, and try again.  I suspect that most of the time
you will succeed on the first snapshot, and so will be able to call
it safe when the last concurrent serializable read write transaction
completes, but that *could* be a long time, and there is no upper
bound.

That's why I was suggesting that we try to keep a fairly current safe
snapshot sitting on the standby and use it when a serializable
transaction is requested.  Unless you request DEFERRABLE, in which
case you would wait for the *next* one to arrive.

I keep looking for another angle on this, but I'm not finding it.
I've *thought* I had something a couple times this evening while
tossing it around in my head, but the ideas fall apart on closer
inspection.  :-(

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to know killed by pg_terminate_backend

2011-01-20 Thread Tatsuo Ishii

> Here is the patch to implement the feature.
> 
> 1) pg_terminate_backend() sends SIGUSR1 signal rather than SIGTERM to
>the target backend.
> 2) The infrastructure used for message passing is
>storage/ipc/procsignal.c The new message type for ProcSignalReason
>is "PROCSIG_TERMNINATE_BACKEND_INTERRUPT"
>  3) I assign new error code 57P04 which is returned from the backend
>   killed by pg_terminate_backend().
> 
> #define ERRCODE_TERMINATE_BACKEND MAKE_SQLSTATE('5','7', 
> 'P','0','4')
> 
> Comments are welcome.

Anyone has better idea? Tom dislikes my patch but I don't know how to
deal with it.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for "terminating connection due to conflict with recovery"

2011-01-20 Thread Tatsuo Ishii

> On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas  wrote:
>> On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs  wrote:
>>> This whole thing is confused. No change is appropriate here at all.
>>>
>>> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
>>> recovery conflicts.
>>>
>>> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
>>> which occurs if someone drops the database that the user was connected
>>> to when they get kicked off. That code exists specifically to inform the
>>> user that they *cannot* reconnect. So pgpool should not be trying to
>>> trap that error and reconnect.
>>
>> CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
>> ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
>> and sometimes uses ERRCODE_ADMIN_SHUTDOWN.  It seems to me that it
>> wouldn't be a bad thing to be a bit more consistent, and perhaps to
>> have dedicated error codes for recovery conflicts.  This bit strikes
>> me as particularly strange:
>>
>>                else if (RecoveryConflictPending && RecoveryConflictRetryable)
>>                {
>>                        
>> pgstat_report_recovery_conflict(RecoveryConflictReason);
>>                        ereport(FATAL,
>>
>> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>>                          errmsg("terminating connection due to
>> conflict with recovery"),
>>                                         errdetail_recovery_conflict()));
>>                }
>>                else if (RecoveryConflictPending)
>>                {
>>                        
>> pgstat_report_recovery_conflict(RecoveryConflictReason);
>>                        ereport(FATAL,
>>                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
>>                          errmsg("terminating connection due to
>> conflict with recovery"),
>>                                         errdetail_recovery_conflict()));
>>                }
>>
>> That's the same error message at the same severity level with two
>> different SQLSTATEs depending on RecoveryConflictRetryable.  Seems a
>> bit cryptic.
> 
> So what we do want to do about this?
> 
> I'm pretty well convinced that we should NOT be issuing
> ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
> fixed by a trivial simplification of the code posted above, without
> introducing any new error code.

I agree with ERRCODE_ADMIN_SHUTDOWN should not be used for a recovery
conflict. And if your proposal does not need to introduce new error
code, I also agree with not inventing new error code.

> I'd also be in favor of changing the one that uses
> ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
> the former might be taken to imply active user intervention, and for
> consistency.

+1.

> It's no longer clear to me that we actually need a new error code for
> this - using the same one everywhere seems like it might be
> sufficient, unless someone wants to make an argument why it isn't.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bug in SignalSomeChildren

2011-01-20 Thread Fujii Masao

On Thu, Jan 20, 2011 at 9:53 PM, Bernd Helmle  wrote:
> I had a look at this for the current CF and the patch looks reasonable to
> me. Some testing shows that the changes are working as intended (at least,
> the wal sender actually receives now signals from SignalSomeChildren() as
> far as the DEBUG4 output shows).

Thanks for the review and test!

> Maybe we should put in a small comment, why
> we special case BACKEND_TYPE_ALL (following Tom's comment about expensive
> shared memory access and IsPostmasterChildWalSender()).

I added the following small comment. Patch attached.

+   /*
+* Since target == BACKEND_TYPE_ALL is the most common case,
+* we test it first and avoid touching shared memory for
+* every child.
+*/

> Question for my understanding:
>
> While reading the small patch, i realized that there's no
> BACKEND_TYPE_WALRECV or similar. If i understand correctly there's no need
> to handle it this way, since there's only one wal receiver process per
> instance?

Yes. But also that's because walreceiver is an auxiliary process (like
bgwriter and
walwriter ..etc) but not a backend.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


signal-some-children-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] READ ONLY fixes

2011-01-20 Thread Kevin Grittner

> Robert Haas  wrote:
> Jeff Janes  wrote:
>> I found the following message somewhat confusing:
>> ERROR: read-only property must be set before any query
> 
> I think what we need here is two messages, this one and a similar one
> that starts with "read-write property...".
> 
>> When a subtransaction has set the mode more stringent than the
>> top-level transaction did, that setting is reversed when the
>> subtransaction ends (whether by success or by rollback), which was
>> discussed as the desired behavior. But the included regression tests
>> do not exercise that case by testing the case where a SAVEPOINT is
>> either rolled back or released. Should those tests be included?
> 
> +1.
 
OK, will put something together.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] READ ONLY fixes

2011-01-20 Thread Kevin Grittner

Robert Haas  wrote:
> Kevin Grittner  wrote:
>> Attached is a rebased roll-up of the 3 and 3a patches from last
>> month.
 
> do you have a link to previous discussion?
 
http://archives.postgresql.org/pgsql-hackers/2010-12/msg00582.php
 
That thread seems to break, but if you look at the references and
follow-up here, you've got the important points, I think:
 
http://archives.postgresql.org/message-id/4cffb703022500038...@gw.wicourts.gov
 
-Kevin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for "terminating connection due to conflict with recovery"

2011-01-20 Thread Robert Haas

On Fri, Jan 14, 2011 at 1:51 PM, Robert Haas  wrote:
> On Fri, Jan 14, 2011 at 12:29 PM, Simon Riggs  wrote:
>> This whole thing is confused. No change is appropriate here at all.
>>
>> We issue ERRCODE_T_R_SERIALIZATION_FAILURE almost all of the time for
>> recovery conflicts.
>>
>> We issue ERRCODE_ADMIN_SHUTDOWN only if the conflict is non-retryable,
>> which occurs if someone drops the database that the user was connected
>> to when they get kicked off. That code exists specifically to inform the
>> user that they *cannot* reconnect. So pgpool should not be trying to
>> trap that error and reconnect.
>
> CheckRecoveryConflictDeadlock() uses ERRCODE_QUERY_CANCELLED.
> ProcessInterrupts() sometimes uses ERRCODE_T_R_SERIALIZATION_FAILURE
> and sometimes uses ERRCODE_ADMIN_SHUTDOWN.  It seems to me that it
> wouldn't be a bad thing to be a bit more consistent, and perhaps to
> have dedicated error codes for recovery conflicts.  This bit strikes
> me as particularly strange:
>
>                else if (RecoveryConflictPending && RecoveryConflictRetryable)
>                {
>                        
> pgstat_report_recovery_conflict(RecoveryConflictReason);
>                        ereport(FATAL,
>
> (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>                          errmsg("terminating connection due to
> conflict with recovery"),
>                                         errdetail_recovery_conflict()));
>                }
>                else if (RecoveryConflictPending)
>                {
>                        
> pgstat_report_recovery_conflict(RecoveryConflictReason);
>                        ereport(FATAL,
>                                        (errcode(ERRCODE_ADMIN_SHUTDOWN),
>                          errmsg("terminating connection due to
> conflict with recovery"),
>                                         errdetail_recovery_conflict()));
>                }
>
> That's the same error message at the same severity level with two
> different SQLSTATEs depending on RecoveryConflictRetryable.  Seems a
> bit cryptic.

So what we do want to do about this?

I'm pretty well convinced that we should NOT be issuing
ERRCODE_ADMIN_SHUTDOWN for a recovery conflict, but that could be
fixed by a trivial simplification of the code posted above, without
introducing any new error code.

I'd also be in favor of changing the one that uses
ERRCODE_QUERY_CANCELLED to use ERRCODE_T_R_SERIALIZATION_FAILURE, as
the former might be taken to imply active user intervention, and for
consistency.

It's no longer clear to me that we actually need a new error code for
this - using the same one everywhere seems like it might be
sufficient, unless someone wants to make an argument why it isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] READ ONLY fixes

2011-01-20 Thread Robert Haas

On Sun, Jan 16, 2011 at 6:58 PM, Jeff Janes  wrote:
> I found the following message somewhat confusing:
> ERROR:  read-only property must be set before any query

I think what we need here is two messages, this one and a similar one
that starts with "read-write property...".

> When a subtransaction has set the mode more stringent than the
> top-level transaction did, that setting is reversed when the
> subtransaction ends (whether by success or by rollback), which was
> discussed as the desired behavior.  But the included regression tests
> do not exercise that case by testing the case where a SAVEPOINT is
> either rolled back or released.  Should those tests be included?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] READ ONLY fixes

2011-01-20 Thread Robert Haas

On Mon, Jan 10, 2011 at 11:27 AM, Kevin Grittner
 wrote:
> Attached is a rebased roll-up of the 3 and 3a patches from last month.

Sorry to be a dweeb, but do you have a link to previous discussion?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 2:22 PM, Noah Misch  wrote:
> Okay.  I've attached a new patch version based on that strategy.

Thanks.  Committed and back-patched to 9.0 (but I didn't use your
regression test).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there a way to build PostgreSQL client libraries with MinGW

2011-01-20 Thread Andrew Dunstan




On 01/20/2011 09:52 PM, Robert Haas wrote:

On Thu, Jan 20, 2011 at 10:17 AM, XiaoboGu  wrote:

Hi,
We are using R to work with 64bit PostgreSQL client libraries, and
to avoid compiler compatibility issues the R development community suggest
using the same compiler for both the main application and dlls. So do you
have any experience to build libpq.dll using MinGW 64 bit. Thanks.

According to the documentation, it's not supported.

http://www.postgresql.org/docs/current/static/install-win32.html

"Building using MinGW or Cygwin uses the normal build system, see
Chapter 15 and the specific notes in Section 15.8.5 and Section
15.8.2. These builds cannot generate 64-bit binaries. Cygwin is not
recommended and should only be used for older versions of Windows
where the native build does not work, such as Windows 98. MinGW is
only recommended if you are building other modules using it. The
official binaries are built using Visual Studio."


That advice needs to be taken with a grain or two of salt. First, while 
you probably should not use Cygwin postgres as a production server, it 
is still the best way to run psql on Windows that I know of. And second, 
the stuff about not being able to generate 64-bit binaries with Mingw is 
no longer true (that's why it's no longer called Mingw32), although it 
is true that nobody I know has yet tried to do so. It's on my long TODO 
list, and well worth doing. (Relying on one compiler is the techno 
equivalent of monolingualism, which my sister's bumper sticker used to 
tell me is a curable condition.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is there a way to build PostgreSQL client libraries with MinGW

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 10:17 AM, XiaoboGu  wrote:
> Hi,
>        We are using R to work with 64bit PostgreSQL client libraries, and
> to avoid compiler compatibility issues the R development community suggest
> using the same compiler for both the main application and dlls. So do you
> have any experience to build libpq.dll using MinGW 64 bit. Thanks.

According to the documentation, it's not supported.

http://www.postgresql.org/docs/current/static/install-win32.html

"Building using MinGW or Cygwin uses the normal build system, see
Chapter 15 and the specific notes in Section 15.8.5 and Section
15.8.2. These builds cannot generate 64-bit binaries. Cygwin is not
recommended and should only be used for older versions of Windows
where the native build does not work, such as Windows 98. MinGW is
only recommended if you are building other modules using it. The
official binaries are built using Visual Studio."

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 9:35 PM, Mark Kirkwood
 wrote:
> On 21/01/11 15:24, Robert Haas wrote:
>>
>> On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
>>   wrote:
>>>
>>> Right -- God only knows the number of things were filtered out to
>>> leave me with filtered water.  What's "filtered" in this case is what
>>> was passed through, not what was removed.
>>
>> Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
>> but I don't like removed any better.
>>
>> Rows eliminated?
>>
>
> Rows filtered *out* ?

Seems like Tom just had the same thought.  Works for me.  I'm still
not thrilled with the proposed formatting, but I can probably live
with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 6:19 PM, Noah Misch  wrote:
> On Thu, Jan 20, 2011 at 09:36:11PM +, Simon Riggs wrote:
>> I agree that the DDL behaviour is wrong and should be fixed. Thank you
>> for championing that alternative view.
>>
>> Swapping based upon names only works and is very flexible, much more so
>> than EXCHANGE could be.
>>
>> A separate utility might be worth it, but the feature set of that should
>> be defined in terms of correctly-working DDL behaviour. It's possible
>> that no further requirement exists. I remove my own patch from
>> consideration for this release.
>>
>> I'll review your patch and commit it, problems or objections excepted. I
>> haven't looked at it in any detail.
>
> Thanks.  I wouldn't be very surprised if that patch is even the wrong way to
> achieve these semantics, but it's great that we're on the same page as to 
> which
> semantics they are.

I think Noah's patch is a not a good idea, because it will result in
calling RangeVarGetRelid twice even in the overwhelmingly common case
where no relevant invalidation occurs.  That'll add several syscache
lookups per table to very common operations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 11:17 AM, Simon Riggs  wrote:
> On Wed, 2011-01-19 at 22:16 -0500, Robert Haas wrote:
>
>> That's another way of saying "the patch is not anywhere close to being done".
>
> My patch is materially incomplete. Certainly we may see that as grounds
> for rejection, which I would not and could not argue with. It is a
> popular feature, so I submitted anyway.

I wouldn't say rejection per se - but I would definitely say push it out to 9.2.

> When I said Noah's patch was trivial, I was referring to the amount of
> work expended on it so far; no insult intended. I think the amount of
> code to finish either is fairly low as well.
>
> If we wish to continue in this release then we must decide how. What I
> was trying to indicate in my earlier comments was that my focus is on
> achieving the required functionality in this release, or put another
> way, I would accept Noah's patch rather than end with nothing.
>
> The main requirement, as I see it, is error checking. We need to do the
> same checking however we do it; neither patch currently does it.
>
> If Noah's patch had error checking, then it would at least be safe to
> recommend people do that. Then it is a simple matter of whether we think
> implicit is OK, or whether it needs an explicit command. My patch does
> it explicitly because that was the consensus from the earlier
> discussion; I am in favour of the explicit route which is why I wrote
> the patch that way, not because I wrote it that way.

I'm not too sure I understand what you mean in saying that Noah's
patch is "implicit"...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Mark Kirkwood


On 21/01/11 15:24, Robert Haas wrote:

On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
  wrote:

Right -- God only knows the number of things were filtered out to
leave me with filtered water.  What's "filtered" in this case is what
was passed through, not what was removed.

Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
but I don't like removed any better.

Rows eliminated?



Rows filtered *out* ?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 8:54 PM, Kevin Grittner
 wrote:
> As I mentioned in another email, we might want to throttle this.  My
> thinking was that we could start a timer on capturing a snapshot, and
> continue to gather new ones as they become available.  When you hit
> the timer limit (maybe 100ms?) you send the latest snapshot, if you
> have a new one; otherwise you keep trying and send one as soon as you
> get it.

I think this is likely to suck.  That's introducing 10 not-small XLOG
records per second just in case someone happens to try to start a
serializable transaction on a standby server.

A possibly-viable alternative would be to build something into the SR
protocol to allow the standby to request a workable snapshot from the
master, and the master to send it (out-of-band with respect to the WAL
stream) when so requested.  Then it wouldn't work if you lose the
connection to the master, but maybe that's OK.  Even with that, it
seems like there could be starvation problems - is there an upper
bound on the length of time it would take the master to generate a
safe snapshot for the standby to use?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane

"Kevin Grittner"  writes:
>> Robert Haas  wrote:
>> Oh, you mean water that had some things you didn't want taken out
>> of it?

> Right -- God only knows the number of things were filtered out to
> leave me with filtered water.  What's "filtered" in this case is what
> was passed through, not what was removed.

I think it's pretty common to use the phrase "filtered out" to identify
the stuff that gets removed by the filter, as opposed to what gets
through.  So we could possibly use "Rows Filtered Out: nnn".  I still
think that's more awkward than "Rows Removed: nnn" though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 9:17 PM, Kevin Grittner
 wrote:
> Right -- God only knows the number of things were filtered out to
> leave me with filtered water.  What's "filtered" in this case is what
> was passed through, not what was removed.

Hmm, I guess I see your point now.  Well, I'm not wedded to the name,
but I don't like removed any better.

Rows eliminated?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Stephen Frost

* Daniel Farina (drfar...@acm.org) wrote:
> On Thu, Jan 20, 2011 at 6:12 PM, Robert Haas  wrote:
> > I'm not likely to write a patch for it, but if someone else writes one
> > I would be willing to (a) support it and (b) subject to consensus,
> > commit it.
> 
> Wouldn't this require a client application to issue the GUC setting?
> Or could I somehow tell a role "You create objects as this user, and
> you cannot change this."

Errr, well, ok, this is curious.

gis=> alter user sfrost set role gis;
ALTER ROLE
gis=> ^D\q
beren:/home/sfrost> psql --cluster 8.4/main -d gis
psql (8.4.5)
Type "help" for help.

gis=> show role;
 role 
--
 gis
(1 row)

I'm trying to figure out if that'd help you or not, but it might.  The
problem is that you'd have to change both the login userid and the
password, but you could make everything both those login roles do be
'as' some third role that's the 'real' role for the user.

I'm still looking for 'default owner for schema', but I guess that's
just me.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Kevin Grittner

> Robert Haas  wrote:
> On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner
>  wrote:
>> Robert Haas  wrote:
>>
>>> I think filtered is pretty clear and like it...
>>
>> I find it ambiguous. [Takes sip of filtered water.]
> 
> Oh, you mean water that had some things you didn't want taken out
> of it?

Right -- God only knows the number of things were filtered out to
leave me with filtered water.  What's "filtered" in this case is what
was passed through, not what was removed.

I hadn't even thought about "filtered" as meaning the input to the
filtering process until Tom mentioned it, but on a different day, in
a different mood, it might also be what I assumed was meant by
"number filtered".

It's kinda hard to imagine a *more* ambiguous term.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Florian Pflug

On Jan21, 2011, at 03:14 , Daniel Farina wrote:
> On Thu, Jan 20, 2011 at 6:12 PM, Robert Haas  wrote:
>> On Thu, Jan 20, 2011 at 9:07 PM, Stephen Frost  wrote:
>>> * Robert Haas (robertmh...@gmail.com) wrote:
 It strikes me that it would be useful to have a GUC that sets the
 owner of any new objects you create (much as you can control their
 default schemas using search_path).
>>> 
>>> There was a great deal of discussion along these lines over the summer
>>> of '09 (iirc) with regard to default owners and with the default
>>> privileges patch.  I encourage you to try and make it happen though.
>> 
>> I'm not likely to write a patch for it, but if someone else writes one
>> I would be willing to (a) support it and (b) subject to consensus,
>> commit it.
> 
> Wouldn't this require a client application to issue the GUC setting?
> Or could I somehow tell a role "You create objects as this user, and
> you cannot change this."

You could do ALTER ROLE SET default_owner TO . Nothing would
prevent the user from resetting default_owner, though - but do you really
need to protect against that?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 4:40 PM, Simone Aiken
 wrote:
> After playing with this in benchmarks and researching the weird results I
> got I'm going to advise dropping the todo for now unless something happens
> to change how postgres handles clustering.

I agree, let's remove it.

That having been said, analyzing TODO items to figure out which ones
are worthless is a useful thing to do, so please feel free to keep at
it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Florian Pflug

On Jan21, 2011, at 03:03 , Robert Haas wrote:
> It strikes me that it would be useful to have a GUC that sets the
> owner of any new objects you create (much as you can control their
> default schemas using search_path).  Obviously, you'd need to restrict
> it so that it wouldn't allow you to create an object owned by a role
> to which you couldn't have given an object owned by yourself.

We could simply refuse to set default_owner to a rule the current user
don't inherit from. If set via an ALTER DATABASE SET ROLE the setting
would then simply be (silently) ignored - or at least this is how it
work for ALTER DATABASE SET ROLE.

> But
> this is what Florian was trying to get at with his much-maligned ALTER
> DATABASE .. SET ROLE, I think, and it seems to me that it would help
> with this case, too.

It's *precisely* what I was trying to get at! Great idea! 

It seems to avoid most of the issues people had with my ALTER DATABASE
SET ROLE trick, too.

> It's always struck me that using multiple
> database logins would create all sorts of inconsistencies with
> different objects ending up owned by different users, but I guess
> until recently I was under the impression I was the only one who had
> an issue with that.  It seems not.

Certainly not :-)

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

On Thu, Jan 20, 2011 at 6:12 PM, Robert Haas  wrote:
> On Thu, Jan 20, 2011 at 9:07 PM, Stephen Frost  wrote:
>> * Robert Haas (robertmh...@gmail.com) wrote:
>>> It strikes me that it would be useful to have a GUC that sets the
>>> owner of any new objects you create (much as you can control their
>>> default schemas using search_path).
>>
>> There was a great deal of discussion along these lines over the summer
>> of '09 (iirc) with regard to default owners and with the default
>> privileges patch.  I encourage you to try and make it happen though.
>
> I'm not likely to write a patch for it, but if someone else writes one
> I would be willing to (a) support it and (b) subject to consensus,
> commit it.

Wouldn't this require a client application to issue the GUC setting?
Or could I somehow tell a role "You create objects as this user, and
you cannot change this."

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 9:07 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> It strikes me that it would be useful to have a GUC that sets the
>> owner of any new objects you create (much as you can control their
>> default schemas using search_path).
>
> There was a great deal of discussion along these lines over the summer
> of '09 (iirc) with regard to default owners and with the default
> privileges patch.  I encourage you to try and make it happen though.

I'm not likely to write a patch for it, but if someone else writes one
I would be willing to (a) support it and (b) subject to consensus,
commit it.

To 9.2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 4:32 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane  wrote:
>>> BTW, is it just me, or is the terminology "number filtered" pretty
>>> confusing/ambiguous in itself?  It doesn't seem at all clear to me
>>> whether that's the number of rows passed by the filter condition or
>>> the number of rows rejected.  Perhaps "nremoved" would be clearer.
>
>> I think filtered is pretty clear and like it...  removed sounds like
>> you deleted something.
>
> Well, you did delete something, no?  There are rows that aren't in the
> output that would have been there if not for the filter condition.

What I mean to say is that I fear that removed would give the
impression that some modification had been made to the database.
Perhaps that's silly, but it's what came to mind.

> And, btw, one person thinking it's clear doesn't make it so.

That's why I said "I think" rather than "Any fool should be able to see that".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> It strikes me that it would be useful to have a GUC that sets the
> owner of any new objects you create (much as you can control their
> default schemas using search_path).

There was a great deal of discussion along these lines over the summer
of '09 (iirc) with regard to default owners and with the default
privileges patch.  I encourage you to try and make it happen though.

> It's always struck me that using multiple
> database logins would create all sorts of inconsistencies with
> different objects ending up owned by different users, but I guess
> until recently I was under the impression I was the only one who had
> an issue with that.  It seems not.

Uh, no, you're definitely not alone.  It's a huge pain in the ass,
imv.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 3:57 PM, Kevin Grittner
 wrote:
> Robert Haas  wrote:
>
>> I think filtered is pretty clear and like it...
>
> I find it ambiguous.  [Takes sip of filtered water.]

Oh, you mean water that had some things you didn't want taken out of it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 8:32 PM, Josh Berkus  wrote:
>
>> * Eventual Retirement of old credentials without having to issue ALTER
>> statements (or really statements of any kind...) against application
>> schema objects.
>
> OK, that's a different goal.  You want to be able to expire passwords
> with an overlap period.  That's quite different from wanting an
> indefinfite number of passwords per role.
>
> Mind you, the main way to do this right now ... and where you're going
> to get pushback ... is using LDAP, ActiveDirectory or similar.  At a
> certain point we have to draw the line and say "PostgreSQL is not an
> authtenication server".  I don't know exactly where that line is, but
> recognize that you're arguing about where to draw it.

Bingo.  I think it would be great to integrate with some external
authentication solution that would support this, but I'm not that keen
on supporting it in the server - not because I don't think it's
useful, but because I think there are 20 other equally weird, equally
useful things that someone might want to do in the alternative, and I
think it'll be unmanageable to try to support them all.  And next year
someone will think of another 20.

It strikes me that it would be useful to have a GUC that sets the
owner of any new objects you create (much as you can control their
default schemas using search_path).  Obviously, you'd need to restrict
it so that it wouldn't allow you to create an object owned by a role
to which you couldn't have given an object owned by yourself.  But
this is what Florian was trying to get at with his much-maligned ALTER
DATABASE .. SET ROLE, I think, and it seems to me that it would help
with this case, too.  It's always struck me that using multiple
database logins would create all sorts of inconsistencies with
different objects ending up owned by different users, but I guess
until recently I was under the impression I was the only one who had
an issue with that.  It seems not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Stephen Frost

* Daniel Farina (drfar...@acm.org) wrote:
> I have thought about that, although LDAP is the only one that came to
> mind (I don't know a whole lot of systems in detail, only by name...so
> suggestions welcome for low-administrative-overhead variants).

My preference is Kerberos and I find that it works quite well and isn't
too much overhead, once you understand it. :)

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

On Thu, Jan 20, 2011 at 5:32 PM, Josh Berkus  wrote:
>
>> * Eventual Retirement of old credentials without having to issue ALTER
>> statements (or really statements of any kind...) against application
>> schema objects.
>
> OK, that's a different goal.  You want to be able to expire passwords
> with an overlap period.  That's quite different from wanting an
> indefinfite number of passwords per role.

Correct; although I don't see a reason to strictly abide by two.
Still, it would get most jobs done.

> Mind you, the main way to do this right now ... and where you're going
> to get pushback ... is using LDAP, ActiveDirectory or similar.  At a
> certain point we have to draw the line and say "PostgreSQL is not an
> authtenication server".  I don't know exactly where that line is, but
> recognize that you're arguing about where to draw it.

Quite correct, as I conceded to Andrew initially. PAM may also be an
option to work around. The problem is that running a reliable,
centralized LDAP service is not justifiable as compared to role
mangling on a per-node level, and the role mangling seems has some
shortcomings that are negotiable with gritted teeth.

A goldilocks case so far of "too hot" and "too cold" I think is
exhibited here.  I do not think the problem unreasonable and it will
become increasingly common on larger and more diverse Postgres
deployments, especially on hosted (do I need to say cloud?)
infrastructure which cannot make as many consistency guarantees about
processes starting all over the place.

For that reason I have brought it to -hackers.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

Tom Lane  wrote:
> Simon Riggs  writes:
>> On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
>>> The idea is that whenever we see a valid snapshot which would
>>> yield a truly serializable view of the data for a READ ONLY
>>> transaction, we add a WAL record with that snapshot information.
>
>> You haven't explained why this approach is the way forwards. What
>> other options have been ruled out, and why. The above approach
>> doesn't sound particularly viable to me.
>
> I'm pretty concerned about the performance implications, too. In
> particular that sounds like you could get an unbounded amount of
> WAL emitted from a *purely read only* transaction flow. Which is
> not going to fly.

Ah, coming back to this and re-reading, I think I see the point of
confusion.  The technique we're suggesting is based on the fact that
the *standby* is read only.  The flow of information about snapshots
(which might be done as actual snapshots with xid values, or possibly
as marker records saying when a candidate snapshot is being
considered and when the last one has been found acceptable) would be
from the master *to* the standby, based on *read write transactions
on the master*.  They would be informing the slave of what would be a
snapshot guaranteed not to see a serialization anomaly to a read only
transaction.

As I mentioned in another email, we might want to throttle this.  My
thinking was that we could start a timer on capturing a snapshot, and
continue to gather new ones as they become available.  When you hit
the timer limit (maybe 100ms?) you send the latest snapshot, if you
have a new one; otherwise you keep trying and send one as soon as you
get it.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Josh Berkus


> * Eventual Retirement of old credentials without having to issue ALTER
> statements (or really statements of any kind...) against application
> schema objects.

OK, that's a different goal.  You want to be able to expire passwords
with an overlap period.  That's quite different from wanting an
indefinfite number of passwords per role.

Mind you, the main way to do this right now ... and where you're going
to get pushback ... is using LDAP, ActiveDirectory or similar.  At a
certain point we have to draw the line and say "PostgreSQL is not an
authtenication server".  I don't know exactly where that line is, but
recognize that you're arguing about where to draw it.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Florian Pflug

On Jan21, 2011, at 01:28 , Simon Riggs wrote:
> What I'm still not clear on is why that HS is different. Whatever rules
> apply on the master must also apply on the standby, immutably. Why is it
> we need to pass explicit snapshot information from master to standby? We
> don't do that, except at startup for normal HS. Why do we need that?

> I hear, but do not yet understand, that the SSI transaction sequence on
> the master may differ from the WAL transaction sequence. Is it important
> that the ordering on the master would differ from the standby?

The COMMIT order in the actual, concurrent, schedule doesn't not necessarily
represent the order of the transaction in an equivalent serial schedule. Here's
an example

T1: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
T1: UPDATE D1 ... ;
T2: BEGIN SERIALIZABLE; -- (Assume snapshot is set here)
T2: SELECT * FROM D1 ... ;
T2: UPDATE D2 ... ;
T1: COMMIT;
T3: SELECT * FROM D1, D2;
T2: COMMIT;

Now, the COMMIT order is T1, T3, T2. Lets check if there is a equivalent
serial schedule. In any such schedule

T2 must run before T1 because T2 didn't see T1's changes to D1
T3 must run after T1 because T3 did see T1's changes to D1
T3 must run before T2 because T3 didn't see T2's changes to D2

This is obviously impossible - if T3 runs before T2 and T2 runs before T1
then T3 runs before T1, contradicting the second requirement. There is thus
no equivalent serial schedule and we must abort of these transactions with
a serialization error.

Note that aborting T3 is sufficient, even though T3 is READ ONLY!. With T3 gone,
an equivalent serial schedule is T2,T1!

On the master, these "run before" requirement are tracked by remembering which
transaction read which parts of the data via the SIREAD-lock mechanism (These
are more flags than locks, since nobody ever blocks on them). 

Since we do not want to report SIREAD locks back to the master, the slave has
to prevent this another way. Kevin's proposed solution does that by only using
those snapshots on the slave for which reading the *whole* database is safe. The
downside is that whether or not a snapshot is safe can only be decided after all
concurrent transactions have finished. The snapshot is thus always a bit 
outdated,
but shows that state that is known to be possible in some serial schedule.

The very same mechanism can be used on the master also by setting the isolation
level to SERIALIZABLE READ ONLY DEFERRED.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Dan Ports

> What I'm still not clear on is why that HS is different. Whatever rules
> apply on the master must also apply on the standby, immutably. Why is it
> we need to pass explicit snapshot information from master to standby? We
> don't do that, except at startup for normal HS. Why do we need that?
> 
> I hear, but do not yet understand, that the SSI transaction sequence on
> the master may differ from the WAL transaction sequence. Is it important
> that the ordering on the master would differ from the standby?

The logical serializable ordering of transactions in SSI doesn't
necessarily match the commit time ordering (i.e. the WAL sequence). For
example, with two concurrent transactions, T1 might commit after T2,
even though it didn't see the changes made by T2 and thus has to be
considered "earlier".

It doesn't matter whether T1 committed before T2 or the other way
around, as long as no other transaction can tell the difference. If
someone saw the changes made by T1 but not those made by T2, they'd see
T2 as happening before T1, violating serializability. Our SSI code
ensures that doesn't happen by tracking read dependencies. If it
detects that such a read is happening, it rolls back one of the
transactions involved.

Now, if we extend this to hot standby, if T2 commits before T1 on the
master, it obviously will on the slave too. A transaction run on the
slave at the right time might be able to see that T2 has happened but
not T1, which is unserializable. If that transaction had ben run on the
master, then it would have been detected and something would have been
rolled back, but the master has no way to know what data is being read
on the slave.

What Kevin is suggesting is that we already have a mechanism for
identifying snapshots where serialization failures like these will
never happen. If we pass that information to the slave and allow it to
run transactions only on those snapshots, serializability is safe.

Hopefully that made some more sense...

Dan

-- 
Dan R. K. Ports  MIT CSAILhttp://drkp.net/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON data type status?

2011-01-20 Thread Bruce Momjian

Itagaki Takahiro wrote:
> On Fri, Jan 21, 2011 at 09:11, Bruce Momjian  wrote:
> > What happened to our work to add a JSON data type for PG 9.1?
> 
> Nothing will happen in 9.1.
> I assume we are in "competition" status:
> 
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php
> 
> Also, if PGXN will work well, we might not have to include JSON
> in the core. We can download any JSON implementations from the
> site after installing the core server.  Of course, if we will
> use JSON types in the core (EXPLAIN JSON output?), we have to
> include one of them.

Thank you.  I have added that URL to our TODO list.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON data type status?

2011-01-20 Thread Itagaki Takahiro

On Fri, Jan 21, 2011 at 09:11, Bruce Momjian  wrote:
> What happened to our work to add a JSON data type for PG 9.1?

Nothing will happen in 9.1.
I assume we are in "competition" status:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00481.php

Also, if PGXN will work well, we might not have to include JSON
in the core. We can download any JSON implementations from the
site after installing the core server.  Of course, if we will
use JSON types in the core (EXPLAIN JSON output?), we have to
include one of them.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

On Thu, Jan 20, 2011 at 4:35 PM, Josh Berkus  wrote:
>
>> How does this work with newly created objects? Is there a way to have
>> them default objects to a different owner, the parent of the two
>> roles?
>
> No, but you could easily assign default permissions.
>
>> In the case of password rotation, the goal would be to
>> drop the old password after all clients have had reasonable chance to
>> get an update.  One could work around by generating new
>> username+password pairs constantly, but there are conveniences to
>> having a stable public-identifier for a role in addition to a private
>> secret used to authenticate it
>
> I guess I don't really understand what the real-world use case for this is.

Here's one: running a cluster with dynamic resource provisioning and
diverse applications, whereby one has the following constraints:

* Ensure all existing open database sessions operate as before without
interruption

* Not be able to ensure after any one point that all *new* connection
attempts will be with the new set of credentials

* Ensure that all database objects created using new or old
credentials are indistinguishable

* Eventual Retirement of old credentials without having to issue ALTER
statements (or really statements of any kind...) against application
schema objects.

I don't see precisely how I can do this.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Josh Berkus


> How does this work with newly created objects? Is there a way to have
> them default objects to a different owner, the parent of the two
> roles?

No, but you could easily assign default permissions.

> In the case of password rotation, the goal would be to
> drop the old password after all clients have had reasonable chance to
> get an update.  One could work around by generating new
> username+password pairs constantly, but there are conveniences to
> having a stable public-identifier for a role in addition to a private
> secret used to authenticate it 

I guess I don't really understand what the real-world use case for this is.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

Tom Lane  wrote:

> I'm pretty concerned about the performance implications, too. In
> particular that sounds like you could get an unbounded amount of
> WAL emitted from a *purely read only* transaction flow.

No.  Read only transactions wouldn't create any flow at all.  And I
suggested that we might want some kind of throttle on how often we
generate snapshots even from the read write transactions.  I'm not
at all clear on how you got to the concerns you have.  Is there
something in particular I could clear up for you that isn't already
mentioned in the previous emails?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Simon Riggs

On Fri, 2011-01-21 at 00:26 +0100, Florian Pflug wrote:
> On Jan21, 2011, at 00:11 , Simon Riggs wrote:
> > It's not clear to me what the reason is that this doesn't just work on
> > HS already. If you started there it might help.
> 
> 
> The problem is that snapshots taken on the master sometimes represent a
> state of the database which cannot occur under any (valid) serial schedule.
> Hence, if you use that snapshot to read the *whole* database, you've
> surely violated serializability. If you read only parts of the database,
> things may or may not be fine, depending on the parts you read.
> 
> To have the same stringent guarantees that SERIALIZABLE provides on the
> master also for queries run against the slave, you somehow need to prevent
> this. The easiest way is to only use snapshots on the slave which *cannot*
> produce such anomalies. We already know now to generate such snapshots -
> SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question
> is mainly how to transfer such snapshots to the slave, and how often we
> transmit a new one.

Thank you for explaining a little more.

What I'm still not clear on is why that HS is different. Whatever rules
apply on the master must also apply on the standby, immutably. Why is it
we need to pass explicit snapshot information from master to standby? We
don't do that, except at startup for normal HS. Why do we need that?

I hear, but do not yet understand, that the SSI transaction sequence on
the master may differ from the WAL transaction sequence. Is it important
that the ordering on the master would differ from the standby?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

On Thu, Jan 20, 2011 at 3:34 PM, Tom Lane  wrote:
> Daniel Farina  writes:
>> I wanted to test the waters on how receptive people might be to an
>> extension that would allow Postgres to support two passwords for a
>> given role.
>
> Not very.  Why don't you just put two roles in the same group?

How does this work with newly created objects? Is there a way to have
them default objects to a different owner, the parent of the two
roles?

It is highly desirable that no ALTER  statements should need
issuing after the password transition is complete.  As-is, though, I
don't understand how that would be possible.

It would also be nice to be able to change a password without changing
the role name. In the case of password rotation, the goal would be to
drop the old password after all clients have had reasonable chance to
get an update.  One could work around by generating new
username+password pairs constantly, but there are conveniences to
having a stable public-identifier for a role in addition to a private
secret used to authenticate it (or, as is the case with this proposal,
more than one acceptable private secrets). Changing the username all
the time to facilitiate this basically renders it part of a unstable,
two-part secret key, and the job of having a stable, public identifier
is pushed up the application stack.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] JSON data type status?

2011-01-20 Thread Bruce Momjian

What happened to our work to add a JSON data type for PG 9.1?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Tom Lane

Simon Riggs  writes:
> On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
>> The idea is that whenever we see a valid snapshot which would yield
>> a truly serializable view of the data for a READ ONLY transaction,
>> we add a WAL record with that snapshot information. 

> You haven't explained why this approach is the way forwards. What other
> options have been ruled out, and why. The above approach doesn't sound
> particularly viable to me.

I'm pretty concerned about the performance implications, too.  In
particular that sounds like you could get an unbounded amount of WAL
emitted from a *purely read only* transaction flow.  Which is not
going to fly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Tom Lane

Daniel Farina  writes:
> I wanted to test the waters on how receptive people might be to an
> extension that would allow Postgres to support two passwords for a
> given role.

Not very.  Why don't you just put two roles in the same group?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus

On 1/20/11 2:26 PM, Tom Lane wrote:
> Josh Berkus  writes:
>>> I would take that to mean that it's waiting on the client.
> 
>> You mean that the client timed out and isn't accepting data from the
>> query anymore?
> 
> No, if the backend is in RECV state, it's waiting for the client to
> *send* it something.

I don't think that's consistent with what we're seeing except maybe in
the  case.  In the other cases, there's a query supposedly
executing.  But, will look for that possibility.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

I wrote:

> Why not?  We already generate appropriate snapshots for this in
> SSI, so is the problem in getting the appropriate information into
> the WAL stream or in having a request for a snapshot within a
> serializable transaction while running in hot standby the problem?

I dropped few words.  

That was supposed to ask whether the problem was in getting hot
standby to *use such a snapshot*.

I'm open to other suggestions on how else we might do this.  I don't
see any alternatives, but maybe you're seeing some possibility that
eludes me.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Florian Pflug

On Jan21, 2011, at 00:11 , Simon Riggs wrote:
> It's not clear to me what the reason is that this doesn't just work on
> HS already. If you started there it might help.

The problem is that snapshots taken on the master sometimes represent a
state of the database which cannot occur under any (valid) serial schedule.
Hence, if you use that snapshot to read the *whole* database, you've
surely violated serializability. If you read only parts of the database,
things may or may not be fine, depending on the parts you read.

To have the same stringent guarantees that SERIALIZABLE provides on the
master also for queries run against the slave, you somehow need to prevent
this. The easiest way is to only use snapshots on the slave which *cannot*
produce such anomalies. We already know now to generate such snapshots -
SERIALIZABLE READ ONLY DEFERRABLE does exactly that. So the open question
is mainly how to transfer such snapshots to the slave, and how often we
transmit a new one.

best regards,
Florian Pflug

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Marko Tiikkaja


On 1/20/2011 12:47 PM, Tom Lane wrote:

So the line I'm thinking we should pursue is to visually associate the
new counter with the filter condition, either like

Filter Cond: (x>  42)  (nfiltered = 123)

or

Filter Cond: (x>  42)
Rows Filtered: 123


I'd prefer the latter.  Sometimes the Filter Cond is very complex and 
finding the nfiltered information would be easier if it always had its 
own row.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

Simon Riggs  wrote:
> On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:
> 
>> The idea is that whenever we see a valid snapshot which would
>> yield a truly serializable view of the data for a READ ONLY
>> transaction, we add a WAL record with that snapshot information. 
> 
> You haven't explained why this approach is the way forwards. What
> other options have been ruled out, and why. The above approach
> doesn't sound particularly viable to me.

Why not?  We already generate appropriate snapshots for this in SSI,
so is the problem in getting the appropriate information into the
WAL stream or in having a request for a snapshot within a
serializable transaction while running in hot standby the problem?

> It's not clear to me what the reason is that this doesn't just
> work on HS already. If you started there it might help.

Because the standby would need to bombard the server with a stream
of predicate lock information, we would need to allow transactions
on the master to be canceled do in part to activity on the standby,
and I don't even know how you would begin to track read/write
conflicts between transactions on two different clusters.

If any of that didn't make sense, it would probably be more
efficient for everyone involved if those interested browsed the
Overview section of the Wiki page than to have me duplicate its
contents in a post.

http://wiki.postgresql.org/wiki/Serializable

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane  wrote:
>> If you're willing to substitute an incompatible table, it's not clear
>> why you don't just do
>> 
>>begin;
>>drop table t;
>>alter table t_new rename to t;
>>commit;

> Because the whole source of this problem is dependency hell.

Well, if you want to preserve dependencies, you can *not* just blindly
substitute an incompatible table.  You must ensure that views and
foreign keys referencing the table are still valid.  So I'm not sure
where anybody got the idea that an implementation that fails to check
all that is even worth presenting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Noah Misch

On Thu, Jan 20, 2011 at 09:36:11PM +, Simon Riggs wrote:
> I agree that the DDL behaviour is wrong and should be fixed. Thank you
> for championing that alternative view.
> 
> Swapping based upon names only works and is very flexible, much more so
> than EXCHANGE could be.
> 
> A separate utility might be worth it, but the feature set of that should
> be defined in terms of correctly-working DDL behaviour. It's possible
> that no further requirement exists. I remove my own patch from
> consideration for this release.
> 
> I'll review your patch and commit it, problems or objections excepted. I
> haven't looked at it in any detail.

Thanks.  I wouldn't be very surprised if that patch is even the wrong way to
achieve these semantics, but it's great that we're on the same page as to which
semantics they are.

> Having said that, writing the patch did nothing to convince me this was
> the correct approach. Reviews should be reviews, they are not an
> opportunity to provide your own alternate version of a patch. That just
> confuses things and creates a competitive, not a cooperative
> environment. Authors do need to listen to reviewers, so I hope I'm
> demonstrating that here. 

Understood.  I can see now that posting a second code patch, however framed, in
the same thread creates a presumption of aggression that is difficult to dispel.
I will have a lot to think about before doing that again.  Thanks for giving
this discussion, which started poorly due to my actions, a second chance.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Large object corruption during 'piped' pg_restore

2011-01-20 Thread Tom Lane

Bosco Rama  writes:
>>> If 'standard_conforming_strings = on' is set in our DB (which is required 
>>> for
>>> our app) then the piped restore method (e.g. pg_restore -O backup.dat | 
>>> psql)
>>> results in the large objects being corrupted.

> All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 
> LTS
> with all current updates applied.

I've been able to replicate this in 8.4; it doesn't happen in 9.0
(but probably does in all 8.x versions).

The problem is that pg_dump (or in this case really pg_restore) is
relying on libpq's PQescapeBytea() to format the bytea literal that
will be given as argument to lowrite() during the restore.  When
pg_dump is producing SQL directly, or when pg_restore is connected
to a database, PQescapeBytea() mooches the standard_conforming_strings
value from the active libpq connection and gets the right answer.
In the single case where pg_restore is producing SQL without ever
opening a database connection, PQescapeBytea doesn't know what to do
and defaults to the old non-standard-strings behavior.  Unfortunately
pg_restore set standard_conforming_strings=on earlier in the script
(if it was set in the original source database) so you get the wrong
thing.

The bottom line is that pg_dump can't depend on libpq's PQescapeBytea,
but needs its own copy.  We have in fact done that as of 9.0, which is
what I was vaguely remembering:

Author: Tom Lane 
Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 +

Fix pg_dump to do the right thing when escaping the contents of large 
objects.

The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, 
then
pg_restore to a script file (not directly to a database), the script will 
set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.

At the moment the code is made to emit hex-format bytea strings when dumping
to a script file.  We might want to change to old-style escaping for 
backwards
compatibility, but that would be slower and bulkier.  If we do, it's just a
matter of reimplementing appendByteaLiteral().

This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.

I'm not sure whether this new complaint is enough reason to reconsider
back-patching.  We cannot just backport the 9.0 patch, since it assumes
it can do bytea hex output --- we'd need to emit old style escaped
output instead.  So it's a bit of work, and more to the point would
involve pushing poorly-tested code into stable branches.  I doubt it
would go wrong, but in the worst-case scenario we might create failures
for blob-restore cases that work now.

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches.  Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Simon Riggs

On Wed, 2011-01-19 at 19:05 -0600, Kevin Grittner wrote:

> The idea is that whenever we see a valid snapshot which would yield
> a truly serializable view of the data for a READ ONLY transaction,
> we add a WAL record with that snapshot information. 

You haven't explained why this approach is the way forwards. What other
options have been ruled out, and why. The above approach doesn't sound
particularly viable to me.

It's not clear to me what the reason is that this doesn't just work on
HS already. If you started there it might help.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

On Thu, Jan 20, 2011 at 2:45 PM, Andrew Dunstan  wrote:
> Have you thought of trying to use an external auth source like LDAP for such
> a scheme?

I have thought about that, although LDAP is the only one that came to
mind (I don't know a whole lot of systems in detail, only by name...so
suggestions welcome for low-administrative-overhead variants). I also
briefly considered investigating what hooks I could exploit for auth &
auth; I do not know these very well right now. It would be ideal to
not have to run another full bore set of services to support phased
password rotation, though -- in this case it would still appear be
better, but frustrating to use the CREATE ROLE ... IN ROLE dance.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 4:24 PM, Tom Lane  wrote:
> Noah Misch  writes:
>> Heikki's suggestion seemed straightforward, so much so that I couldn't figure
>> why nobody had done it.  That would usually mean I'm missing something.
>
> If you're willing to substitute an incompatible table, it's not clear
> why you don't just do
>
>                begin;
>                drop table t;
>                alter table t_new rename to t;
>                commit;

Because the whole source of this problem is dependency hell.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] One Role, Two Passwords

2011-01-20 Thread Andrew Dunstan




On 01/20/2011 05:28 PM, Daniel Farina wrote:

Hello list,

I wanted to test the waters on how receptive people might be to an
extension that would allow Postgres to support two passwords for a
given role. I have recently encountered a case where this would be
highly useful when performing rolling password upgrades across many
client applications and/or application instances.

It is possible (as far as I know) to get around some of the sticker
parts of this with some teeth gnashing, using some CREATE ROLE ... IN
ROLE dancing, but I wanted to see if there was any interest in
supporting this "for real."

This design is not uncommon, one example is Amazon Web Services (e.g.
EC2, S3), whereby one identification key can have many, independently
revokable secret keys.

I haven't given much thought to the mechanism yet, rather, I am just
trying to assess gut reactions on the principle.


Have you thought of trying to use an external auth source like LDAP for 
such a scheme?



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] One Role, Two Passwords

2011-01-20 Thread Daniel Farina

Hello list,

I wanted to test the waters on how receptive people might be to an
extension that would allow Postgres to support two passwords for a
given role. I have recently encountered a case where this would be
highly useful when performing rolling password upgrades across many
client applications and/or application instances.

It is possible (as far as I know) to get around some of the sticker
parts of this with some teeth gnashing, using some CREATE ROLE ... IN
ROLE dancing, but I wanted to see if there was any interest in
supporting this "for real."

This design is not uncommon, one example is Amazon Web Services (e.g.
EC2, S3), whereby one identification key can have many, independently
revokable secret keys.

I haven't given much thought to the mechanism yet, rather, I am just
trying to assess gut reactions on the principle.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Tom Lane

Josh Berkus  writes:
>> I would take that to mean that it's waiting on the client.

> You mean that the client timed out and isn't accepting data from the
> query anymore?

No, if the backend is in RECV state, it's waiting for the client to
*send* it something.

(Although if this is an SSL connection, it's a bit harder to be sure
about what the logical state of the connection is.)

> Shouldn't Postgres time out on that after a while?

Not if the problem is the client is confused.  As long as the remote-end
kernel doesn't indicate the connection is dead, we'll wait for the
client to wake up and send us a command.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Josh Berkus

Kevin,

> So, based on a more complete description of the issues, any more
> opinions on whether to generate the error, as suggested by Heikki? 

If it's a choice between generating an error and letting users see
inconsistent data, I'll take the former.

> Does anyone think this justifies the compatibility GUC as suggested
> by Jeff?  

I think it might, yes.  Since someone could simply turn on the backwards
compatibility flag for 9.1 and turn it off for 9.2, rather than trying
to mess with transaction states which might be set in application code.

Unfortunately, people have not responded to our survey :-(
http://www.postgresql.org/community/survey.77

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra

Dne 20.1.2011 11:05, Csaba Nagy napsal(a):
> Hi Tomas,
> 
> On Wed, 2011-01-19 at 23:13 +0100, Tomas Vondra wrote:
>> No, the multi-column statistics do not require constant updating. There
>> are cases where a sampling is perfectly fine, although you may need a
>> bit larger sample. Generally if you can use a multi-dimensional
>> histogram, you don't need to scan the whole table.
> 
> In the cases where sampling is enough, you can do that to the updates
> too: do a sampling on the changes, in that you only process every Nth
> change to make it to the estimator. If you can also dynamically tune the
> N to grow it as the statistics stabilize, and lower it if you detect
> high variance, even better.
> 
> If the analyze process could be decoupled from the backends, and maybe
> just get the data passed over to be processed asynchronously, then that
> could be a feasible way to have always up to date statistics when the
> bottleneck is IO and CPU power is in excess. If that then leads to
> better plans, it could really be a win exceeding the overhead.

OK, this sounds interesting. I'm not sure how to do that but it might be
a good solution. What about transactions? If the client inserts data
(and it will be sent asynchronously to update the estimator) and then
rolls back, is the estimator 'rolled back' or what happens?

This was exactly the reason why I initially wanted to collect all the
data at the backend (and send them to the estimator at commit time).
Which was then replaced by the idea to keep a local estimator copy and
merge it back to the original estimator at commit time.

> If this analyze process (or more of them) could also just get the data
> from the modified buffers in a cyclic way, so that backends need nothing
> extra to do, then I don't see any performance disadvantage other than
> possible extra locking contention on the buffers and non-determinism of
> the actual time when a change makes it to the statistics. Then you just
> need to get more CPU power and higher memory bandwidth to pay for the
> accurate statistics.

Well, the possible locking contention sounds like a quite significant
problem to me :-(

The lag between an update and a change to the stats is not that big deal
I guess - we have the same behaviour with the rest of the stats (updated
by autovacuum every once a while).

Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and Hot Standby

2011-01-20 Thread Kevin Grittner

Heikki Linnakangas  wrote:
> On 20.01.2011 03:05, Kevin Grittner wrote:
>> If we don't do something like this, do we just provide REPEATABLE
>> READ on the standby as the strictest level of transaction
>> isolation?  If so, do we generate an error on a request for
>> SERIALIZABLE, warn and provide degraded behavior, or just quietly
>> give them REPEATABLE READ behavior?
> 
> +1 for generating an error.

Before I go do that, I want to be sure everyone is clear about the
state of things.

If SSI is used to provide data integrity on the master, it will
prevent any serialization anomalies from being persisted on any hot
standby *long term*.  For example, at any point where the standby is
at a point in the transaction stream where there were no read/write
transaction active, no anomalies can be observed.  (That isn't the
*only* time; it's just the simplest one to describe as an example.) 
Queries on the standby can, however, see *transient* anomalies when
they run queries which would cause a serialization failure if run on
the master at the same point in the transaction stream.  This can
only occur when, of two concurrent transactions, the one which
*appears* to run second because the other can't read what it wrote,
*commits* first.

The most common and alarming situation where this occurs, in my
opinion, is batch processing.  This is extremely common in financial
applications, and tends to show up in a lot of other places, too. 
(The receipting query set is an instance of this type of problem,
but I'm going to keep it more general in hopes that people can see
where it impacts them.)  Imagine an application which has some small
control record in a table, and inserts to some other table are
assigned to a batch based on the control record.  The batches are
normally identified by ascending dates or serial numbers. 
Periodically a new batch is opened and the old batch is closed by
updating a "current batch id" column in the control table.  If the
batch ID is updated and the transaction in which that update was
executed commits while a transaction which read the old batch ID is
still in flight, a read of the database will show that the batch is
closed, but if you look at the detail of the batch, it will not yet
be complete.

Under SSI, one of these transactions will be canceled to prevent
this.  Our implementation will always allow the update which closes
the batch to complete, and either the insert or the select of the
detail will be rolled back with a serialization failure, depending
on the timing the actions inside those transactions.  If the insert
fails, it can be retried, and will land in the new batch -- making
the list of the batch which omits it OK.  If the listing of the
batch details is canceled, it will be because the insert into the
old batch committed before it recognized the problem, so an
immediate retry of the select will see the complete batch contents.

A hot standby can't really take part in the predicate locking and
transaction cancellation on the master.

Dan and I have both come to the conclusion that the only reasonable
way to allow hot standby to work with SSI is for the WAL (when
wal_level = hot_standby) to contain information about which
snapshots develop which won't see such a state.  In the above
example, barring some throttling mechanism skipping these particular
snapshots, or other problematic conflicts around the same time, the
master would tell the standby that the snapshot before either of the
two problem transactions was OK, and then it would tell them that
the snapshot after both had committed was OK.  It would not suggest
using the snapshot available between the commit of the control
record update and the commit of the insert into the batch.

This seems to me to be not completely unrelated to the snapshot
synchronization patch.  It is clearly closely related to the READ
ONLY DEFERRABLE mode, which also looks for a snapshot which is
immune to serialization anomalies without predicate locking,
conflict detection, transaction cancellation, etc.  Melding these
two things with hot standby seems to be beyond what can reasonably
happen for 9.1 without delaying the release.

If someone is using one feature and not the other, they really don't
have a problem.  Like anyone else, if a hot standby user has been
using SERIALIZABLE mode under 9.0 or earlier, they will need to
switch to REPEATABLE READ.  A SERIALIZABLE user who doesn't set up
hot standby has no issue.  Opinions so far seem to be in favor of
reporting an error on the standby if SERIALIZABLE is requested, so
that people don't silently get less protection than they expect. 
The most annoying thing about that is that if the use would *like*
to use truly serializable transactions on the standby, and will do
so when they get it in 9.2, they must switch to REPEATABLE READ now,
and switch back to SERIALIZABLE with the next release.

So, based on a more complete description of the issues, any more
opinions on whether to generat

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra

Dne 20.1.2011 09:10, Heikki Linnakangas napsal(a):
> It seems that the suggested multi-column selectivity estimator would be
> more sensitive to ndistinct of the individual columns. Is that correct?
> How is it biased? If we routinely under-estimate ndistinct of individual
> columns, for example, does the bias accumulate or cancel itself in the
> multi-column estimate?
> 
> I'd like to see some testing of the suggested selectivity estimator with
> the ndistinct estimates we have. Who knows, maybe it works fine in
> practice.

The estimator for two columns and query 'A=a AND B=b' is about

 0.5 * (dist(A)/dist(A,B) * Prob(A=a) + dist(B)/dist(A,B) * Prob(B=b))

so it's quite simple. It's not that sensitive to errors or ndistinct
estimates for individual columns, but the problem is in the multi-column
ndistinct estimates. It's very likely that with dependent colunms (e.g.
with the ZIP codes / cities) the distribution is so pathological that
the sampling-based estimate will be very off.

I guess this was a way too short analysis, but if you can provide more
details of the expected tests etc. I'll be happy to provide that.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus


> I would take that to mean that it's waiting on the client.

You mean that the client timed out and isn't accepting data from the
query anymore?  Shouldn't Postgres time out on that after a while?  In
one case, the orphaned statement was 16 hours old before we killed it.

If it's relevant, the client connection is from a C application via
libpq on localhost.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra

Dne 20.1.2011 03:36, Robert Haas napsal(a):
> On Wed, Jan 19, 2011 at 5:13 PM, Tomas Vondra  wrote:
 Regarding the crash scenario - if the commit fails, just throw away the
 local estimator copy, it's not needed. I'm not sure how to take care of
 the case when commit succeeds and the write of the merged estimator
 fails, but I think it might be possible to write the estimator to xlog
 or something like that. So it would be replayed during recovery etc. Or
 is it a stupid idea?
>>>
>>> It's not stupid, in the sense that that is what you'd need to do if
>>> you want to avoid ever having to rescan the table.  But it is another
>>> thing that I think is going to be way too expensive.
>>
>> Way too expensive? All you need to put into the logfile is a copy of the
>> estimator, which is a few kBs. How is that 'way too expensive'?
> 
> At this point, this is all a matter of religion, right?  Neither of us
> has a working implementation we've benchmarked.  But yes, I believe
> you're going to find that implementing some kind of streaming
> estimator is going to impose a...   6%
> performance penalty, even after you've optimized the living daylights
> out of it.  Now you might say... big deal, it improves my problem
> queries by 100x.  OK, but if you could get the same benefit by doing
> an occasional full table scan during off hours, you could have the
> same performance with a *0%* performance penalty.  Even better, the
> code changes would be confined to ANALYZE rather than spread out all
> over the system, which has positive implications for robustness and
> likelihood of commit.

Good point. What I was trying to do was to continuously update the
estimator with new data - that was the whole idea behind the collecting
of new values (which might lead to problems with memory etc. as you've
pointed out) and updating a local copy of the estimator (which is a good
idea I think).

But this might be another option - let the user decide if he wants to
continuously update the estimates (and pay the price) or do that off the
hours (and pay almost nothing). That sounds as a very good solution to me.

> I'm not trying to argue you out of working on this.  It's obviously
> your time to spend, and if works better than I think it will, great!
> I'm merely offering you an opinion on what will probably happen if you
> go this route - namely, it'll carry an unpalatable run-time penalty.
> That opinion may be worth no more than what you paid for it, but there
> you have it.

Yes, and I appreciate all feedback. But I still believe this can be done
so that users that don't need the feature don't pay for it.

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ToDo List Item - System Table Index Clustering

2011-01-20 Thread Simone Aiken

After playing with this in benchmarks and researching the weird results I
got I'm going to advise dropping the todo for now unless something happens
to change how postgres handles clustering.  You guys probably already
grokked this so I am just recording it for the list archives.  
  

The primary factor here is that postgres doesn't maintain clustered indexes.
Clustering is a one-time operation that clusters the table at this current
point in time.  Basically, there really isn't any such thing in postgres as
a clustered index.  There is an operation - Cluster - which takes an index
and a table as input and re-orders the table according to the index.   But
it is borderline fiction to call the index used "clustered" because the next
row inserted will pop in at the end of the table instead of slipping into
the middle of the table per the desired ordering.  

All the pg_table cluster candidates are candidates because they have a row
per table column and we expect that a query will want to get several of
these rows at once.  These rows are naturally clustered because the scripts
that create them insert their information into the catalog contiguously.
When you create a catalog table the pg_attribute rows for its columns are
inserted together.  When you then create all its triggers they too are put
into pg_triggers one after the other.  So calling the Cluster operation
after dbinit doesn't help anything.

Over time table alterations can fragment this information.   If a user loads
a bunch of tables, then alters them over time the columns added later on
will have their metadata stored separately from the columns created
originally. 

Which gets us to the down and dirty of how the Cluster function works.  It
puts an access exclusive lock on the entire table - blocking all attempts to
read and write to the table - creates a copy of the table in the desired
order, drops the original, and renames the copy.  Doing this to a catalog
table that is relevant to queries pretty much brings everything else in the
database to a halt while the system table is locked up.  And the brute force
logic makes this time consuming even if the table is perfectly ordered
already.  Additionally, snapshots taken of the table during the Cluster
operation make the table appear to be empty which introduces the possibility
of system table corruption if transactions are run concurrently with a
Cluster operation.

So basically, the Cluster operation in its current form is not something you
want running automatically on a bunch of system table as it is currently
implemented.  It gives your system the hiccups.  You would only want to run
it manually during downtime.  And you can do that just as easily with or
without any preparation during dbinit.


Thanks everyone,

-Simone Aiken





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Tom Lane

Josh Berkus  writes:
> One of our clients is seeing an unusual issue with statements which are
> waiting going into sleep forever and never completing.   This first
> e-mail is for a "has anyone else seen this problem before?" while we try
> to collect additional information for a diagnosis.

> An strace on the process shows it to be in RECV, and otherwise doing
> nothing.

I would take that to mean that it's waiting on the client.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Simon Riggs

On Thu, 2011-01-20 at 13:14 -0500, Noah Misch wrote:

> When DDL has taken AccessExclusiveLock and a query waits for it, it's the 
> Right
> Thing for that query to wake up and proceed based on the complete, final state
> of that committed DDL.  Aside from the waiting itself, the query should behave
> as though it started after the DDL completed.
> 
> In my example, the SELECT silently reads data from a table named "old_t".  
> What
> if that were an INSERT?  The data falls in the wrong table.
> 
> > Heikki's suggestion, and your patch, contain no checking to see whether
> > the old and new tables are similar. If they are not similar then we have
> > all the same problems raised by my patch. SQL will suddenly fail because
> > columns have ceased to exist, FKs suddenly disappear etc..
> 
> Indeed, Heikki's suggestion and my patch would not do such verification.  I
> can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
> ...; CREATE ...; than we allow today.  Those need to stay open-ended, with the
> user responsible for choosing well.  So, what's the right concurrent behavior
> around use of those statements?  I answer that above.
> 
> That said, I see utility in a feature that compares two tables, swaps them if
> similar, and fixes up foreign keys.  Having such a feature does not justify
> wrong concurrent behavior around ALTER TABLE RENAME.  Having right concurrent
> behavior around ALTER TABLE RENAME does not remove the utility of this 
> feature.
> We should do both.

I agree that the DDL behaviour is wrong and should be fixed. Thank you
for championing that alternative view.

Swapping based upon names only works and is very flexible, much more so
than EXCHANGE could be.

A separate utility might be worth it, but the feature set of that should
be defined in terms of correctly-working DDL behaviour. It's possible
that no further requirement exists. I remove my own patch from
consideration for this release.

I'll review your patch and commit it, problems or objections excepted. I
haven't looked at it in any detail.

Having said that, writing the patch did nothing to convince me this was
the correct approach. Reviews should be reviews, they are not an
opportunity to provide your own alternate version of a patch. That just
confuses things and creates a competitive, not a cooperative
environment. Authors do need to listen to reviewers, so I hope I'm
demonstrating that here. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus


> I have seen it -- on 8.1 too.  On our case it was caused by an insert
> that was doing lots of toast insertions, so it needed to grab the
> extension lock frequently for the toast table; and this was slowed
> down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
> partitioning, so this was expensive).  I don't recall details on why
> these were related.  If this is your case too, I doubt you'd be able to
> reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
> introduced).

Thanks, I'll bet that's the case.  This is happening on machines with
more RAM, so they've increased shared_buffers.

Now, to get them off 8.1.  Been trying for over a year now ...

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane  wrote:
>> BTW, is it just me, or is the terminology "number filtered" pretty
>> confusing/ambiguous in itself?  It doesn't seem at all clear to me
>> whether that's the number of rows passed by the filter condition or
>> the number of rows rejected.  Perhaps "nremoved" would be clearer.

> I think filtered is pretty clear and like it...  removed sounds like
> you deleted something.

Well, you did delete something, no?  There are rows that aren't in the
output that would have been there if not for the filter condition.

And, btw, one person thinking it's clear doesn't make it so.  There
are actually three numbers that could be involved here: the number of
rows arriving at the filter, the number passed by it, and the number
rejected by it.  I think that "nfiltered" could possibly mean any of
those three.  A non-native speaker of English would be even less
likely to be sure of what was meant.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] estimating # of distinct values

2011-01-20 Thread Tomas Vondra

Dne 20.1.2011 03:06, Nathan Boley napsal(a):
>> And actually it does not depend on ndistinct for the columns only, it
>> depends on ndistinct estimates for the combination of columns. So
>> improving the ndistinct estimates for columns is just a necessary first
>> step (and only if it works reasonably well, we can do the next step).
> 
> I think that any approach which depends on precise estimates of
> ndistinct is not practical.

I'm not aware of any other approach to the 'discrete fail case' (where
the multi-dimensional histograms are not applicable). If someone finds a
better solution, I'll be the first one to throw away this stuff.

> I am very happy that you've spent so much time on this, and I'm sorry
> if my previous email came off as combative. My point was only that
> simple heuristics have served us well in the past and, before we go to
> the effort of new, complicated schemes, we should see how well similar
> heuristics work in the multiple column case. I am worried that if the
> initial plan is too complicated then nothing will happen and, even if
> something does happen, it will be tough to get it committed ( check
> the archives for cross column stat threads - there are a lot ).

If I've leaned one thing over the years in IT, it's not to take critique
personally. All the problems mentioned in this thread are valid
concerns, pointing out weak points of the approach. And I'm quite happy
to receive this feedback - that's why I started it.

On the other hand - Jara Cimrman (a famous Czech fictional character,
depicted as the best scientist/poet/teacher/traveller/... - see [1])
once said that you can't be really sure you don't get gold by blowing
cigarette smoke into a basin drain, until you actually try it. So I'm
blowing cigaretter smoke into the drain ...

It may wery vell happen this will be a dead end, but I'll do my best to
fix all the issues or to prove that the pros outweight the cons. And
even if it will be eventually rejected, I hope to get -1 from TL to be
eligible for that t-shirt ...

[1] http://en.wikipedia.org/wiki/Jara_Cimrman

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Tom Lane

Noah Misch  writes:
> Heikki's suggestion seemed straightforward, so much so that I couldn't figure
> why nobody had done it.  That would usually mean I'm missing something.

If you're willing to substitute an incompatible table, it's not clear
why you don't just do

begin;
drop table t;
alter table t_new rename to t;
commit;

There are some implementation issues with this: concurrent accesses are
likely to end up failing with "relation with OID nnn doesn't exist",
because backends translate the table's name to OID before acquiring
lock.  But you'd have to solve those issues anyway to make an ALTER
REPLACE WITH work as transparently as you seem to hope it would.
Unless the idea here is to also have t_new acquire t's OID, and that
is an absolute complete won't-happen if you're not enforcing a pretty
thorough level of compatibility between the two tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Orphaned statements issue

2011-01-20 Thread Alvaro Herrera

Excerpts from Josh Berkus's message of jue ene 20 18:05:15 -0300 2011:

> 
> One of our clients is seeing an unusual issue with statements which are
> waiting going into sleep forever and never completing.   This first
> e-mail is for a "has anyone else seen this problem before?" while we try
> to collect additional information for a diagnosis.

I have seen it -- on 8.1 too.  On our case it was caused by an insert
that was doing lots of toast insertions, so it needed to grab the
extension lock frequently for the toast table; and this was slowed
down by a largish shared_buffers setting, somehow (8.1 doesn't have lock
partitioning, so this was expensive).  I don't recall details on why
these were related.  If this is your case too, I doubt you'd be able to
reproduce it in 9.0 (not even in 8.2 which is when lock partitioning was
introduced).

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine
>  wrote:
>> Robert Haas  writes:
>>> I think that the basic problem with wal_level is that to increase it
>>> you need to somehow ensure that all the backends have the new setting,
>>> and then checkpoint.
>> 
>> Well, you just said when to force the "reload" to take effect: at
>> checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
>> add that behavior here?  And signal every backend at checkpoint time
>> when wal_level has changed?

> Sending them a signal seems like a promising approach, but the trick
> is guaranteeing that they've actually acted on it before you start the
> checkpoint.

Have the backends show their current wal_level in their PGPROC entries.
Sleep till they're all reporting the right thing, then fire checkpoint.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Orphaned statements issue

2011-01-20 Thread Josh Berkus

Hackers,

One of our clients is seeing an unusual issue with statements which are
waiting going into sleep forever and never completing.   This first
e-mail is for a "has anyone else seen this problem before?" while we try
to collect additional information for a diagnosis.

This problem only happens under load and only when performing load tests
that insert large data (250 Kib) into bytea columns.  It's takes a
couple hours but we've been able to reproduce the issue with a 100%
success rate.  Sometime the locked query shows up as '' in the
pg_stat_activity sometimes it's a complicated query using a multi-table
left outer join, other times is a simple select.  The only thing in
common is that there is never a corresponding entry for that statement
in the pg_locks table, and if you drop the connection the query goes away.

An strace on the process shows it to be in RECV, and otherwise doing
nothing.  We have not been able to run GDB because it takes a couple
hours of running a heavy load test to cause the issue.  While memory is
heavily used during the test, there is no swapping during the test which
would indicate Linux memory management as the culprit.

We can reproduce the issue on 8.1.11 and 8.1.23.  Currently we are
working on testing it on 9.0 and seeing if we can reproduce the issue.

We compiled the postgres from the Redhat source RPM.  The only
modification that we make is the config file.

The OS is Centos 5.4 32bit.

Hardware:
IBM 3650
2 x Dual Core Intel Xeon 5160 @ 3.00 GHz
16 GB memory
6 x 146 GB SAS 10K RPM in RAID-5

Please note that while we can reproduce the issue, access to the test
system is fairly restrictive and test runs take a while, so I'd like to
get requests for additional information-collecting all at once if possible.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Kevin Grittner

Robert Haas  wrote:
 
> I think filtered is pretty clear and like it...
 
I find it ambiguous.  [Takes sip of filtered water.]  How about
excluded?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 3:47 PM, Tom Lane  wrote:
> The main functional problem I see with this format is that it assumes
> there is one and only one filter step associated with every plan node.
> That is just plain wrong.  Many don't have any, and there are important
> cases where there are two.  I'm thinking in particular that it might be
> useful to distinguish the effects of the recheck and the filter
> conditions of a bitmap heap scan.

If it's not too hard to do that, I'm all in favor.

> Maybe it'd also be interesting to
> separate the join and non-join filter clauses of a join node, though
> I'm less sure about the usefulness of that.

That would also be extremely useful.

> So the line I'm thinking we should pursue is to visually associate the
> new counter with the filter condition, either like
>
>        Filter Cond: (x > 42)  (nfiltered = 123)
>
> or
>
>        Filter Cond: (x > 42)
>        Rows Filtered: 123
>
> The latter is less ambiguous, but takes more vertical space.  The former
> is very unlikely to break any client code, because I doubt there is any
> that inquires into the details of what a filter condition expression
> really means.  The latter *might* break code depending on how much
> it assumes about the number of detail lines attached to a plan node
> ... but as Robert pointed out, we've added new detail lines before.

I like the idea of putting it on the same line as the filter
condition, but your proposal for how to do that doesn't wow me - the
parentheses look too similar to the ones around the qual itself.

> BTW, is it just me, or is the terminology "number filtered" pretty
> confusing/ambiguous in itself?  It doesn't seem at all clear to me
> whether that's the number of rows passed by the filter condition or
> the number of rows rejected.  Perhaps "nremoved" would be clearer.

I think filtered is pretty clear and like it...  removed sounds like
you deleted something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Tom Lane

hubert depesz lubaczewski  writes:
> On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
>> He also mentioned that he didn't feel it was terribly complicated or
>> that it'd be difficult to update for this.  Looking over the code, it's
>> got a simple regex for matching that line which would have to be
>> updated, but I don't think it'd require much more than that.

> i'll be happy to update the Pg::Explain to handle new elements of
> textual plans, so if this would be of concern - please don't treat
> "compatibility with explain.depesz.com" as your responsibility/problem.

The point isn't whether it'd be "terribly difficult" to update client
side EXPLAIN-parsing code ... it's whether we should break it in the
first place.  I don't find the proposed format so remarkably
well-designed that it's worth creating compatibility problems for.

The main functional problem I see with this format is that it assumes
there is one and only one filter step associated with every plan node.
That is just plain wrong.  Many don't have any, and there are important
cases where there are two.  I'm thinking in particular that it might be
useful to distinguish the effects of the recheck and the filter
conditions of a bitmap heap scan.  Maybe it'd also be interesting to
separate the join and non-join filter clauses of a join node, though
I'm less sure about the usefulness of that.

So the line I'm thinking we should pursue is to visually associate the
new counter with the filter condition, either like

Filter Cond: (x > 42)  (nfiltered = 123)

or

Filter Cond: (x > 42)
Rows Filtered: 123

The latter is less ambiguous, but takes more vertical space.  The former
is very unlikely to break any client code, because I doubt there is any
that inquires into the details of what a filter condition expression
really means.  The latter *might* break code depending on how much
it assumes about the number of detail lines attached to a plan node
... but as Robert pointed out, we've added new detail lines before.

BTW, is it just me, or is the terminology "number filtered" pretty
confusing/ambiguous in itself?  It doesn't seem at all clear to me
whether that's the number of rows passed by the filter condition or
the number of rows rejected.  Perhaps "nremoved" would be clearer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python refactoring

2011-01-20 Thread Peter Eisentraut

On ons, 2011-01-19 at 10:06 +0900, Hitoshi Harada wrote:
> - This is not in the patch, but around line 184 "vis versa" in comment
> seems like typo.

Fixed.

> - A line break should be added before PLy_add_exception() after "static void"

I'll add that when I get to the patch.

> - This is also not in the patch, but the comment
> /* these should only be called once at the first call
>  * of plpython_call_handler.  initialize the python interpreter
>  * and global data.
>  */
> is bogus. PLy_init_interp() is called in _PG_init().

Fixed.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] exceptions not present in plpy with Python 3

2011-01-20 Thread Peter Eisentraut

On lör, 2010-12-18 at 18:56 +0100, Jan Urbański wrote:
> there seems to be a problem in the way we add exceptions to the plpy
> module in PL/Python compiled with Python 3k.
> 
> Try this: DO $$ plpy.SPIError $$ language plpython3u;
> 
> I'm not a Python 3 expert, but I nicked some code from the Internet and
> came up with this patch (passes regression tests on both Python 2 and 3).

It looks like the PyModule_AddObject() approach also works in Python 2.
Anyone see an issue with using that uniformly?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Heikki Linnakangas


On 20.01.2011 22:15, Dimitri Fontaine wrote:

Robert Haas  writes:

Sending them a signal seems like a promising approach, but the trick
is guaranteeing that they've actually acted on it before you start the
checkpoint.


How much using a latch here would help?  Or be overkill?


A latch doesn't give you an acknowledgment from the backends that 
they've received and acted on the guc change. You could use it as a 
building block to construct that, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
> Sending them a signal seems like a promising approach, but the trick
> is guaranteeing that they've actually acted on it before you start the
> checkpoint.

How much using a latch here would help?  Or be overkill?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread hubert depesz lubaczewski

On Thu, Jan 20, 2011 at 02:48:59PM -0500, Stephen Frost wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
> > How much has that code been updated from one release to the next?
> 
> Just an FYI, I talked to depesz on IRC (please chime in if you disagree
> with any of this) and he indicated that he's had to update the code
> from time to time, mostly because the parser was too strict.
> 
> He also mentioned that he didn't feel it was terribly complicated or
> that it'd be difficult to update for this.  Looking over the code, it's
> got a simple regex for matching that line which would have to be
> updated, but I don't think it'd require much more than that.

i'll be happy to update the Pg::Explain to handle new elements of
textual plans, so if this would be of concern - please don't treat
"compatibility with explain.depesz.com" as your responsibility/problem.

I'll fix the parser (have to add json/xml parsing too anyway), and I,
too, would love to get more information.

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 2:10 PM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>> I think that the basic problem with wal_level is that to increase it
>> you need to somehow ensure that all the backends have the new setting,
>> and then checkpoint.  Right now, the backends get the value through
>> the GUC machinery, and so there's no particular bound on how long it
>> could take for them to pick up the new value.  I think if we could
>> find some way of making sure that the backends got the new value in a
>> reasonably timely fashion, we'd be pretty close to being able to do
>> this.  But it's hard to see how to do that.
>
> Well, you just said when to force the "reload" to take effect: at
> checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
> add that behavior here?  And signal every backend at checkpoint time
> when wal_level has changed?

Sending them a signal seems like a promising approach, but the trick
is guaranteeing that they've actually acted on it before you start the
checkpoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 1:47 PM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Robert Haas  writes:
>> > I agree; we make bigger changes than this all the time.
>>
>> No, we don't.
>
> Alright, do we want to go down the road of adding new things to the
> XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
> version, to avoid this concern?

No, because, for one thing, the text output is what people are going
to send me when they want me to fix their crap.  If the information
isn't there, I lose.  And no, I don't want them to send me the XML.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> How much has that code been updated from one release to the next?

Just an FYI, I talked to depesz on IRC (please chime in if you disagree
with any of this) and he indicated that he's had to update the code
from time to time, mostly because the parser was too strict.

He also mentioned that he didn't feel it was terribly complicated or
that it'd be difficult to update for this.  Looking over the code, it's
got a simple regex for matching that line which would have to be
updated, but I don't think it'd require much more than that.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] ALTER TYPE 1: recheck index-based constraints

2011-01-20 Thread Noah Misch

On Thu, Jan 20, 2011 at 09:26:29AM -0500, Robert Haas wrote:
> My main beef with the Boolean flags is that this kind of thing is not too 
> clear:
> 
>reindex_relation(myrel, false, false, true, true, false, true,
> false, false, true);
> 
> Unless you have an excellent memory, you can't tell what the heck
> that's doing without flipping back and forth between the function
> definition and the call site.  With a bit-field, it's a lot easier to
> glance at the call site and have a clue what's going on.  We're of
> course not quite to the point of that exaggerated example yet.

Agreed.

> > However, suppose we inverted both flags, say REINDEX_SKIP_CONSTRAINT_CHECKS 
> > and
> > REINDEX_ALLOW_OLD_INDEX_USE. ?Then, flags = 0 can hurt performance but not
> > correctness. ?That's looking like a win.
> 
> I prefer the positive sense for those flags because I think it's more
> clear.  There aren't so many call sites or so many people using this
> that we have to worry about what people are going to do in new calling
> locations; getting it right in any new code shouldn't be a
> consideration.

Okay.  I've attached a new patch version based on that strategy.
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1c9df98..75e7055 100644
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***
*** 2533,2558  reindex_index(Oid indexId, bool skip_constraint_checks)
   * reindex_relation - This routine is used to recreate all indexes
   * of a relation (and optionally its toast relation too, if any).
   *
!  * If heap_rebuilt is true, then the relation was just completely rebuilt by
!  * an operation such as VACUUM FULL or CLUSTER, and therefore its indexes are
!  * inconsistent with it.  This makes things tricky if the relation is a system
!  * catalog that we might consult during the reindexing.  To deal with that
!  * case, we mark all of the indexes as pending rebuild so that they won't be
!  * trusted until rebuilt.  The caller is required to call us *without* having
!  * made the rebuilt versions visible by doing CommandCounterIncrement; we'll
!  * do CCI after having collected the index list.  (This way we can still use
!  * catalog indexes while collecting the list.)
   *
!  * We also skip rechecking uniqueness/exclusion constraint properties if
!  * heap_rebuilt is true.  This avoids likely deadlock conditions when doing
!  * VACUUM FULL or CLUSTER on system catalogs.  REINDEX should be used to
!  * rebuild an index if constraint inconsistency is suspected.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
   */
  bool
! reindex_relation(Oid relid, bool toast_too, bool heap_rebuilt)
  {
Relationrel;
Oid toast_relid;
--- 2533,2561 
   * reindex_relation - This routine is used to recreate all indexes
   * of a relation (and optionally its toast relation too, if any).
   *
!  * "flags" can include REINDEX_SUPPRESS_INDEX_USE and 
REINDEX_CHECK_CONSTRAINTS.
   *
!  * If flags has REINDEX_SUPPRESS_INDEX_USE, the relation was just completely
!  * rebuilt by an operation such as VACUUM FULL or CLUSTER, and therefore its
!  * indexes are inconsistent with it.  This makes things tricky if the relation
!  * is a system catalog that we might consult during the reindexing.  To deal
!  * with that case, we mark all of the indexes as pending rebuild so that they
!  * won't be trusted until rebuilt.  The caller is required to call us 
*without*
!  * having made the rebuilt versions visible by doing CommandCounterIncrement;
!  * we'll do CCI after having collected the index list.  (This way we can still
!  * use catalog indexes while collecting the list.)
!  *
!  * To avoid deadlocks, VACUUM FULL or CLUSTER on a system catalog must omit 
the
!  * REINDEX_CHECK_CONSTRAINTS flag.  REINDEX should be used to rebuild an index
!  * if constraint inconsistency is suspected.  For optimal performance, other
!  * callers should include the flag only after transforming the data in a 
manner
!  * that risks a change in constraint validity.
   *
   * Returns true if any indexes were rebuilt.  Note that a
   * CommandCounterIncrement will occur after each index rebuild.
   */
  bool
! reindex_relation(Oid relid, bool toast_too, int flags)
  {
Relationrel;
Oid toast_relid;
***
*** 2608,2614  reindex_relation(Oid relid, bool toast_too, bool 
heap_rebuilt)
List   *doneIndexes;
ListCell   *indexId;
  
!   if (heap_rebuilt)
{
/* Suppress use of all the indexes until they are 
rebuilt */
SetReindexPending(indexIds);
--- 2611,2617 
List   *doneIndexes;
ListCell   *indexId;
  
!   if (flags & REINDEX_SUPPRESS_INDEX_USE)
{

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Dimitri Fontaine

Robert Haas  writes:
> I think that the basic problem with wal_level is that to increase it
> you need to somehow ensure that all the backends have the new setting,
> and then checkpoint.  Right now, the backends get the value through
> the GUC machinery, and so there's no particular bound on how long it
> could take for them to pick up the new value.  I think if we could
> find some way of making sure that the backends got the new value in a
> reasonably timely fashion, we'd be pretty close to being able to do
> this.  But it's hard to see how to do that.

Well, you just said when to force the "reload" to take effect: at
checkpoint time.  IIRC we already multiplex SIGUSR1, is that possible to
add that behavior here?  And signal every backend at checkpoint time
when wal_level has changed?

> I had some vague idea of creating a mechanism for broadcasting
> critical parameter changes.  You'd make a structure in shared memory
> containing the "canonical" values of wal_level and all other critical
> variables, and the structure would also contain a 64-bit counter.
> Whenever you want to make a parameter change, you lock the structure,
> make your change, bump the counter, and release the lock.  Then,
> there's a second structure, also in shared memory, where backends
> report the value that the counter had the last time they updated their
> local copies of the structure from the shared structure.  You can
> watch that to find out when everyone's guaranteed to have the new
> value.  If someone doesn't respond quickly enough, you could send them
> a signal to get them moving.  What would really be ideal is if you
> could actually make this safe enough that the interrupt service
> routine could do all the work, rather than just setting a flag.  Or
> maybe CHECK_FOR_INTERRUPTS().  If you can't make it safe enough to put
> it in someplace pretty low-level like that, the whole idea might fall
> apart, because it wouldn't be useful to have a way of doing this that
> mostly works except sometimes it just sits there and hangs for a
> really long time.
>
> All pie in the sky at this point...

Unless we manage to simplify enough the idea to have wal_level SIGHUP.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving test_fsync to /contrib?

2011-01-20 Thread Josh Berkus

On 1/20/11 6:15 AM, Robert Haas wrote:
> On Thu, Jan 20, 2011 at 9:13 AM, Bruce Momjian  wrote:
>> OK, I am ready to move test_fsync to /contrib.  Is pg_test_fsync the
>> best name?  pg_check_fsync?  pg_fsync_performance?  pg_verify_fsync?
> 
> I don't see too much reason to rename it more than necessary, so how
> about pg_test_fsync?

+1.


-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > I agree; we make bigger changes than this all the time.
> 
> No, we don't.

Alright, do we want to go down the road of adding new things to the
XML/JSON/YAML/Whatever-else format that isn't displayed in the TEXT
version, to avoid this concern?

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] ALTER TABLE ... REPLACE WITH

2011-01-20 Thread Noah Misch

On Thu, Jan 20, 2011 at 10:07:23AM +, Simon Riggs wrote:
> On Wed, 2011-01-19 at 17:46 -0500, Noah Misch wrote:
> 
> > First, I'd like to note that the thread for this patch had *four* "me-too"
> > responses to the use case.  That's extremely unusual; the subject is 
> > definitely
> > compelling to people.  It addresses the bad behavior of natural attempts to
> > atomically swap two tables in the namespace:
> > 
> > psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
> > ('new')"
> > psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
> > sleep 1   # 
> > give prev time to take AccessShareLock
> > 
> > # Do it this way, and the next SELECT gets data from the old table.
> > #psql -c 'ALTER TABLE t RENAME TO old_t; ALTER TABLE new_t RENAME TO t' 
> > &
> > # Do it this way, and get: ERROR:  could not open relation with OID 
> > 41380
> > psql -c 'DROP TABLE t; ALTER TABLE new_t RENAME TO t' &
> > 
> > psql -c 'SELECT * FROM t'   # I get 'old' or an error, 
> > never 'new'.
> > psql -c 'DROP TABLE IF EXISTS t, old_t, new_t'
> > 
> > by letting you do this instead:
> > 
> > psql -c "CREATE TABLE t AS VALUES ('old'); CREATE TABLE new_t AS VALUES 
> > ('new')"
> > psql -c 'SELECT pg_sleep(2) FROM t' & # block the ALTER or DROP briefly
> > sleep 1   # 
> > give prev time to take AccessShareLock
> > 
> > psql -c 'EXCHANGE TABLE new_t TO t &
> > 
> > psql -c 'SELECT * FROM t'   # I get 'new', finally!
> > psql -c 'DROP TABLE IF EXISTS t, new_t'
> > 
> > I find Heikki's (4d07c6ec.2030...@enterprisedb.com) suggestion from the 
> > thread
> > interesting: can we just make the first example work?  Even granting that 
> > the
> > second syntax may be a useful addition, the existing behavior of the first
> > example is surely worthless, even actively harmful.  I tossed together a
> > proof-of-concept patch, attached, that makes the first example DTRT.  Do 
> > you see
> > any value in going down that road?
> 
> As I said previously on the thread you quote, having this happen
> implicitly is not a good thing, and IMHO, definitely not "the right
> thing".

When DDL has taken AccessExclusiveLock and a query waits for it, it's the Right
Thing for that query to wake up and proceed based on the complete, final state
of that committed DDL.  Aside from the waiting itself, the query should behave
as though it started after the DDL completed.

In my example, the SELECT silently reads data from a table named "old_t".  What
if that were an INSERT?  The data falls in the wrong table.

> Heikki's suggestion, and your patch, contain no checking to see whether
> the old and new tables are similar. If they are not similar then we have
> all the same problems raised by my patch. SQL will suddenly fail because
> columns have ceased to exist, FKs suddenly disappear etc..

Indeed, Heikki's suggestion and my patch would not do such verification.  I
can't see detecting and blocking some patterns of ALTER TABLE RENAME or DROP
...; CREATE ...; than we allow today.  Those need to stay open-ended, with the
user responsible for choosing well.  So, what's the right concurrent behavior
around use of those statements?  I answer that above.

That said, I see utility in a feature that compares two tables, swaps them if
similar, and fixes up foreign keys.  Having such a feature does not justify
wrong concurrent behavior around ALTER TABLE RENAME.  Having right concurrent
behavior around ALTER TABLE RENAME does not remove the utility of this feature.
We should do both.

> I don't see how having a patch helps at all. I didn't think it was the
> right way before you wrote it and I still disagree now you've written
> it.

Perhaps it helped me more than anyone else, and I should have kept it to myself.
Heikki's suggestion seemed straightforward, so much so that I couldn't figure
why nobody had done it.  That would usually mean I'm missing something.  So, I
implemented it in a effort to discover what I had missed, failing to do so.
Then, I sent it with the review in case you might spot what I had missed.
Failure to add some kind of table similarity check was intentional, per above.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Magnus Hagander

On Jan 20, 2011 6:43 PM, "Tom Lane"  wrote:
>
> Robert Haas  writes:
> > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost 
wrote:
> >> While I agree completely about the general "if you're going to break,
> >> break it big" approach, but I don't particularly care for holding
output
> >> strings from EXPLAIN to the same level that we do the wireline
protocol.
>
> > I agree; we make bigger changes than this all the time.
>
> No, we don't.  It's true that a client that wants to truly *understand*
> the plan has to know a lot of things, but the fundamental format of
> EXPLAIN ANALYZE output has been real stable for a real long time:
>
>  node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual
time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
>   detail line: something or other
>   ->  subnode name  ... more of the same ...
>
> This level of understanding seems plenty sufficient for something like
> explain.depesz.com, to name just one popular tool.  The last format
> change of any kind we made in this skeleton was to increase the number
> of decimal places in the "actual time" numbers from 2 to 3 (wow).
> That was in 7.4.  Modulo that detail, this basic contract has been valid
> since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
> will break it.
>
> It might be interesting for somebody to go look at Hubert's code and see
> just how much it really knows about the EXPLAIN output format, and how
> much it's had to change across PG releases.
>

Haven't looked at what changes with this patch, but dont forget PgAdmin that
also parses the output. Though if the format changes enough to affect it,
that might be the driving force to have it use xml format instead, which is
the one that is intended for machine parsing after all..

/Magnus

Re: [HACKERS] REVIEW: EXPLAIN and nfiltered

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 12:57 PM, Magnus Hagander  wrote:
>
> On Jan 20, 2011 6:43 PM, "Tom Lane"  wrote:
>>
>> Robert Haas  writes:
>> > On Thu, Jan 20, 2011 at 11:55 AM, Stephen Frost 
>> > wrote:
>> >> While I agree completely about the general "if you're going to break,
>> >> break it big" approach, but I don't particularly care for holding
>> >> output
>> >> strings from EXPLAIN to the same level that we do the wireline
>> >> protocol.
>>
>> > I agree; we make bigger changes than this all the time.
>>
>> No, we don't.  It's true that a client that wants to truly *understand*
>> the plan has to know a lot of things, but the fundamental format of
>> EXPLAIN ANALYZE output has been real stable for a real long time:
>>
>>  node name  (cost=xxx.xx..xxx.xx rows=xxx width=xxx) (actual
>> time=xxx.xxx..xxx.xxx rows=xxx loops=xxx)
>>   detail line: something or other
>>   ->  subnode name  ... more of the same ...
>>
>> This level of understanding seems plenty sufficient for something like
>> explain.depesz.com, to name just one popular tool.  The last format
>> change of any kind we made in this skeleton was to increase the number
>> of decimal places in the "actual time" numbers from 2 to 3 (wow).
>> That was in 7.4.  Modulo that detail, this basic contract has been valid
>> since EXPLAIN ANALYZE was invented, in 7.2.  As proposed, this patch
>> will break it.
>>
>> It might be interesting for somebody to go look at Hubert's code and see
>> just how much it really knows about the EXPLAIN output format, and how
>> much it's had to change across PG releases.
>>
>
> Haven't looked at what changes with this patch, but dont forget PgAdmin that
> also parses the output. Though if the format changes enough to affect it,
> that might be the driving force to have it use xml format instead, which is
> the one that is intended for machine parsing after all..

How much has that code been updated from one release to the next?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup for streaming base backups

2011-01-20 Thread Robert Haas

On Thu, Jan 20, 2011 at 11:59 AM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>> Also, it won't actually work unless the server has replication
>> configured (wal_level!=minimal, max_wal_senders>0, and possibly some
>> setting for wal_keep_segments), which has been the main point of the
>> naming discussion thus far.  Now, you know what would be REALLY cool?
>> Making this work without any special advance configuration.  Like if
>> we somehow figured out a way to make max_wal_senders unnecessary, and
>> a way to change wal_level without bouncing the server, so that we
>> could temporarily boost the WAL level from minimal to archive if
>> someone's running a backup.
>
> Not using max_wal_senders we're on our way, you "just" have to use the
> external walreceiver that Magnus the code for already.  WAL level, I
> don't know that we have that already, but a big part of what this base
> backup tool is useful for is preparing a standby… so certainly you want
> to change that setup there.

Well, yeah, but it would be nice to also use it just to take a regular
old backup on a system that doesn't otherwise need replication.

I think that the basic problem with wal_level is that to increase it
you need to somehow ensure that all the backends have the new setting,
and then checkpoint.  Right now, the backends get the value through
the GUC machinery, and so there's no particular bound on how long it
could take for them to pick up the new value.  I think if we could
find some way of making sure that the backends got the new value in a
reasonably timely fashion, we'd be pretty close to being able to do
this.  But it's hard to see how to do that.

I had some vague idea of creating a mechanism for broadcasting
critical parameter changes.  You'd make a structure in shared memory
containing the "canonical" values of wal_level and all other critical
variables, and the structure would also contain a 64-bit counter.
Whenever you want to make a parameter change, you lock the structure,
make your change, bump the counter, and release the lock.  Then,
there's a second structure, also in shared memory, where backends
report the value that the counter had the last time they updated their
local copies of the structure from the shared structure.  You can
watch that to find out when everyone's guaranteed to have the new
value.  If someone doesn't respond quickly enough, you could send them
a signal to get them moving.  What would really be ideal is if you
could actually make this safe enough that the interrupt service
routine could do all the work, rather than just setting a flag.  Or
maybe CHECK_FOR_INTERRUPTS().  If you can't make it safe enough to put
it in someplace pretty low-level like that, the whole idea might fall
apart, because it wouldn't be useful to have a way of doing this that
mostly works except sometimes it just sits there and hangs for a
really long time.

All pie in the sky at this point...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest wrap-up

2011-01-20 Thread Tom Lane

Robert Haas  writes:
> On Tue, Dec 21, 2010 at 10:49 PM, Robert Haas  wrote:
>> On Tue, Dec 21, 2010 at 11:12 AM, Tom Lane  wrote:
> Robert Haas  writes:
> On Wed, Dec 15, 2010 at 11:29 AM, Tom Lane  wrote:
 - Writeable CTEs - I think we need Tom to pick this one up.
 - Fix snapshot taking inconsistencies - Ready for committer. Can any
 committer pick this up?

> Tom, are you still planning to pick these two up?  They've been
> basically collecting dust for over two months now, or in one case
> three months, and we're running out of time.

Yes, I will get to them.  I haven't yet put my head down into full
commit fest mode...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 148 matches

Mail list logo