Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-16 Thread Amit kapila

   I guess my first question is: why do we need this?  There are lots of
   things in the TODO list that someone wanted once upon a time, but
   they're not all actually important.  Do you have reason to believe
   that this one is?  It's been six years since that email, so it's worth
   asking if this is actually relevant.
 
 As far as I know the pg_control is not WAL protected, which means if it
 gets corrupt due
 to any reason (disk crash during flush, so written partially), it might
 lead to failure in recovery of database.

 AFAIR pg_controldata fit on a disk sector so it can not be half written.
   It can be corrupt due to some other reasons as well like torn disk sector.
  As already pg_resetxlog has a mechanism to recover corrupt pg_control file, 
so it is already considered that it can be corrupt in some case.
 The suggested patch improves the logic to recover corrupt control file. So 
that is the reason I felt it will be relevant to do this patch.

From: Cédric Villemain [ced...@2ndquadrant.com]
Sent: Saturday, June 16, 2012 2:19 AM
To: pgsql-hackers@postgresql.org
Cc: Amit kapila; 'Robert Haas'
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Le vendredi 15 juin 2012 03:27:11, Amit Kapila a écrit :
  I guess my first question is: why do we need this?  There are lots of
  things in the TODO list that someone wanted once upon a time, but
  they're not all actually important.  Do you have reason to believe
  that this one is?  It's been six years since that email, so it's worth
  asking if this is actually relevant.

 As far as I know the pg_control is not WAL protected, which means if it
 gets corrupt due
 to any reason (disk crash during flush, so written partially), it might
 lead to failure in recovery of database.

AFAIR pg_controldata fit on a disk sector so it can not be half written.

 So user can use pg_resetxlog to recover the database. Currently
 pg_resetxlog works on guessed values for pg_control.
 However this implementation can improve the logic that instead of guessing,
 it can try to regenerate the values from
 WAL.
 This implementation can allow better recovery in certain circumstances.

  The deadline for patches for this CommitFest is today, so I think you
  should target any work you're starting now for the NEXT CommitFest.

 Oh, I am sorry, as this was my first time I was not fully aware of the
 deadline.

 However I still seek your opinion whether it makes sense to work on this
 feature.


 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Friday, June 15, 2012 12:40 AM
 To: Amit Kapila
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Allow WAL information to recover corrupted
 pg_controldata

 On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com

 wrote:
  I am planning to work on the below Todo list item for this CommitFest
  Allow WAL information to recover corrupted pg_controldata
  http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

 The deadline for patches for this CommitFest is today, so I think you
 should target any work you're starting now for the NEXT CommitFest.

  I wanted to confirm my understanding about the work involved for this

 patch:
  The existing patch has following set of problems:
 1. Memory leak and linked list code path is not proper
 2. lock check for if the server is already running, is removed in
  patch which needs to be reverted
 3. Refactoring of the code.
 
  Apart from above what I understood from the patch is that its intention
  is to generate values for ControlFile using WAL logs when -r option is
  used.
 
  The change in algorithm from current will be if control file is corrupt
  which essentialy means ReadControlFile() will return False, then it
  should generate values (checkPointCopy, checkPoint, prevCheckPoint,
  state) using WAL if -r option is enabled.
 
  Also for -r option, it doesn't need to call function FindEndOfXLOG() as

 the

  that work will be achieved by above point.
 
  It will just rewrite the control file and don’t do other resets.
 
 
  The algorithm of restoring the pg_control value from old xlog file:
 1. Retrieve all of the active xlog files from xlog direcotry into a

 list

  by increasing order, according their timeline, log id, segment id.
 2. Search the list to find the oldest xlog file of the lastest time

 line.

 3. Search the records from the oldest xlog file of latest time line to
  the latest xlog file of latest time line, if the checkpoint record
has been found, update the latest checkpoint and previous

 checkpoint.

  Apart from above some changes in code will be required after the Xlog

 patch

  by Heikki.
 
  Suggest me if my understanding is correct?

 I guess my first question is: why do we need this?  There are lots of
 things in the TODO list that someone wanted once upon a time, but
 they're not all actually 

Re: [HACKERS] Resource Owner reassign Locks

2012-06-16 Thread Amit kapila
 I don't think so.  C doesn't ref count its pointers.
You are right I have misunderstood.

 I don't think that lock tags have good human readable formats, and just
 a pointer dump probably wouldn't be much use when something that can
 never happen has happened.  But I'll at least add a reference to the
 resource owner if this stays in.

I have checked in lock.c file for the message where lock tags have been used.
elog(ERROR, lock %s on object %u/%u/%u is already held,
lockMethodTable-lockModeNames[lockmode],
lock-tag.locktag_field1, lock-tag.locktag_field2,
lock-tag.locktag_field3);

This can give more information about erroneous lock.


From: Jeff Janes [jeff.ja...@gmail.com]
Sent: Saturday, June 16, 2012 3:21 AM
To: Amit kapila
Cc: pgsql-hackers
Subject: Re: [HACKERS] Resource Owner reassign Locks

On Mon, Jun 11, 2012 at 9:30 PM, Amit Kapila amit.kap...@huawei.com wrote:
 Yes, that means the list has over-flowed.  Once it is over-flowed, it
 is now invalid for the reminder of the life of the resource owner.

 Don't we need any logic to clear the reference of locallock in owner-locks
 array.

I don't think so.  C doesn't ref count its pointers.

 MAX_RESOWNER_LOCKS - How did you arrive at number 10 for it. Is there any
 specific reason for 10.

I instrumented the code to record the maximum number of locks held by
a resource owner, and report the max when it was destroyed.  (That
code is not in this patch).  During a large pg_dump, the vast majority
of the resource  owners had maximum locks of 2, with some more at 4
and 6.Then there was one resource owner, for the top-level
transaction, at tens or hundreds of thousands (basically one for every
lockable object).  There was little between 6 and this top-level
number, so I thought 10 was a good compromise, safely above 6 but not
so large that searching through the list itself was likely to bog
down.

Also, Tom independently suggested the same number.


 Should it emit a FATAL rather than an ERROR?  I thought ERROR was
 sufficient to make the backend quit, as it is not clear how it could
 meaningfully recover.

 I am not able to visualize any valid scenario in which it can happen unless
 some corruption happens.
 If this happens, user can close all statements and abort its transactions.
 According to me ERROR is okay. However in the message Can't find lock to
 remove,  it could be better,
 if there is information about resource owner and lock.

I think we might end up changing that entirely once someone more
familiar with the error handling mechanisms takes a look at it.  I
don't think that lock tags have good human readable formats, and just
a pointer dump probably wouldn't be much use when something that can
never happen has happened.  But I'll at least add a reference to the
resource owner if this stays in.

Thanks,

Jeff
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-16 Thread Magnus Hagander
On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Yes, but there's also a lot of such awkward logic we need to add if we
 *do* go with the SSL library doing the compression:

 For example, we can no longer trust the SSL library to always do
 encryption, since we specifically want to support null encryption.

 True, but are you sure we don't need to do that anyway?  What happens
 today, if a non-libpq client connects with SSL and specifies null
 encryption?

openssl rejects the connection unless you have explicitly allowed NULL
encryption in ssl_ciphers.

Which is the only sensible default.


 And we currently have no way to specify different
 encryption options on a per-host basis, which is something we'd have
 to do (e.g. i want to be able to say that subnet x requires
 encryption with these encryptions methods and subnet y doesn't
 require encryption but should do compression.

 [ shrug... ]  Having that sort of control over a homebrew compression
 solution will *also* require a lot of control logic that does not exist
 today.

The important part isn't really being able to control the compression
in this. It's that we're overloading a convenience feature
(compression) in the settings of a security feature (encryption).
Which leads to both complex processing, and also a fairly high risk of
accidentally configuring what you wouldn't want unless we change the
interface to make it look like separate things even if they aren't.


 So there's quite a bit of complexity that needs to be put in there
 just to deal with the fact that we're using SSL to do compression, if
 we want to support it in a way that's not hackish.

 It's not obvious to me that we actually *need* anything except the
 ability to recognize that a null-encrypted SSL connection probably
 shouldn't be treated as matching a hostssl line; which is not something
 that requires any fundamental rearrangements, since it only requires an
 after-the-fact check of what was selected.  Things like subnet x
 requires encryption with these encryption methods are features that are
 sensible with our existing feature set.  But we don't have that now and
 nobody has asked for it, so I think you are moving the goalposts rather
 unfairly by claiming that a compression-related patch needs to add it.

Maybe I spelled it out wrong. It does require it insofar that if we
want to use this for compression, we must *always* enable openssl on
the connection. So the with these encryption method boils down to
NULL encryption only or whatever other standards I have for
encryption. We don't need the ability to change the whatever other
standards per subnet, but we need to control the
accept-NULL-encryption on a per subnet basis.

It also risks some level of information leak - assuming someone
connects with NULL encryption and we don't support it, unless we do
something particular about it, the error message will go out in
cleartext. Today, you will get a client generated error message and no
actual message crosses the wire in cleartext.

It's not that we can't deal with those things. It's just that it's
going to take some work, and some careful thought about exactly which
parts can be exposed over NULL encrypted connections.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-16 Thread Magnus Hagander
On Sat, Jun 16, 2012 at 12:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
 On Sat, Jun 16, 2012 at 6:39 AM, Magnus Hagander mag...@hagander.net wrote:
 Fair enough if we decide that - but we should make that decision
 knowing that we're leaving the JDBC and .Net people in a bad position
 where they are not likely to be able to implement his.

 The JDBC people have a theoretical chance if the JDK is open. The .Net
 people are stuck with schannel that doesn't support it at this point.
 It might well do in the future (since it's in the standard); but
 they're at the mercy of Microsoft.

 Both Java and C# are open-source enough that anybody can
 take existing SSL implementation and add compression to it,
 then distribute it as improved SSL library.

 Possibly more to the point: that is work they might have to do, if
 nobody else steps up to the plate --- and if they do end up doing it,
 it could benefit other projects too.  On the other hand, if we
 roll-our-own transport compression solution, that is work they *will*
 have to do, with no chance of sharing the effort with other projects.

True - provided said upstream (Oracle in the Java case) are interested
in accepting the patches...

If they end up having to port one of the compressoin algorithms, let's
dake LZ4 as an example, then they can certainly release that as open
source under a compatible license, thus making it available to others.

Though that's not necessarily that relevant - LZ4 already has a C#
implementation for .net, a JNI wrapper for Java.
Snappy even has a native Java implementation.

So if we went down that road, there wouldn't *be* a need to implement
it. Just the protocol parts itself, which are - compared to
implementing the actual compression in either scheme - trivial.


 BTW, as far as the .Net case goes, it took only a moment's googling
 to find this:
 http://openssl-net.sourceforge.net/
 which is a .Net wrapper around real OpenSSL.  It doesn't appear to
 provide wrappers for the compression selection functions, but surely
 that's just a lack of round tuits, not that it would take more than
 five minutes to add them.

that would then loose all the advantages that npgsql get from
schannel, such as integrated certificate management. So it can be done
- but it would AFAICT require a fairly large rearchitecture of how
security is handled, it would add a license-incompatible requirement,
and it would loose other features. But it can be done.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Combine non-recursive and recursive CTEs?

2012-06-16 Thread Magnus Hagander
I'm not sure if this is something I don't know how to do, or if it's
something we simply can't do, or if it's something we could do but the
syntax can't handle :-)

Basically, I'd like to combine a recursive and a non-recursive CTE in
the same query. If I do it non-recursive, I can do something like:

WITH t1(z) AS (
   SELECT a FROM x
),
t2 AS (
   SELECT z FROM t1
)
SELECT * FROM t2;


But what if I want t2 to be recursive?

Trying something like:
WITH t1 (z,b) AS (
   SELECT a,b FROM x
),
RECURSIVE t2(z,b) AS (
   SELECT z,b FROM t1 WHERE b IS NULL
 UNION ALL
   SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z
)

I get a syntax error on the RECURSIVE.

Is there any other position in this query that I can put the RECURSIVE
in order for it to get through?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-16 Thread Tom Lane
Amit kapila amit.kap...@huawei.com writes:
 AFAIR pg_controldata fit on a disk sector so it can not be half written.

It can be corrupt due to some other reasons as well like torn disk sector.

Torn disk sector?  Please, this is nonsense.  Disks cannot write half
a sector and then stop.  A sufficiently badly designed drive might
attempt to start a write when it didn't have enough power left to finish
... but the result of that would be a corrupt sector with a non-matching
CRC, not one that read back okay but contained erroneous data.

  The suggested patch improves the logic to recover corrupt control file. So 
 that is the reason I felt it will be relevant to do this patch.

Well, we invented pg_resetxlog with the thought that it might be useful
for such situations, but I'm not sure offhand that we've ever seen a
field report of corrupted pg_control files.  For instance, a quick
search in the archives for incorrect checksum in control file turns up
only cases of pilot error, such as supposing that a 32-bit database
could be used with a 64-bit server or vice versa.  Actual hardware
failures on the pg_control file could be expected to result in something
like could not read from control file: I/O error, which I find no
evidence for at all in the archives.

Before adding new code to improve the situation, it would be good to
have (a) evidence that there's a problem worth solving, and (b) a theory
as to what likely-to-occur cases the new code is going to make better,
while not making things worse in other likely-to-occur cases.  Case in
point here is that it's not immediately obvious that we should trust
the contents of WAL more than pg_control --- the former gets a whole
lot more write traffic and hence has many more opportunities for
failure.

At the moment I don't see that we have either (a) or (b), so I think
it's pretty dubious to be making any changes of this sort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combine non-recursive and recursive CTEs?

2012-06-16 Thread PostgreSQL - Hans-Jürgen Schönig
On Jun 16, 2012, at 8:27 AM, Magnus Hagander wrote:

 I'm not sure if this is something I don't know how to do, or if it's
 something we simply can't do, or if it's something we could do but the
 syntax can't handle :-)
 
 Basically, I'd like to combine a recursive and a non-recursive CTE in
 the same query. If I do it non-recursive, I can do something like:
 
 WITH t1(z) AS (
   SELECT a FROM x
 ),
 t2 AS (
   SELECT z FROM t1
 )
 SELECT * FROM t2;
 
 
 But what if I want t2 to be recursive?
 
 Trying something like:
 WITH t1 (z,b) AS (
   SELECT a,b FROM x
 ),
 RECURSIVE t2(z,b) AS (
   SELECT z,b FROM t1 WHERE b IS NULL
 UNION ALL
   SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z
 )
 
 I get a syntax error on the RECURSIVE.
 
 Is there any other position in this query that I can put the RECURSIVE
 in order for it to get through?
 
 -- 
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


hm, this is interesting ...

cat /tmp/a.sql 
WITHy AS ( SELECT 1 AS n),
g AS (WITH RECURSIVE x(n) AS
(
SELECT (SELECT n FROM y) AS n
UNION ALL
SELECT n + 1 AS n
FROM x
WHERE n  10))
SELECT * FROM g;

Hans-Jurgen-Scbonigs-MacBook-Pro:sql hs$ psql test  /tmp/a.sql 
ERROR:  syntax error at or near )
LINE 8:  WHERE n  10))

this gives a syntax error as well ... 
if my early morning brain is correct this should be a proper statement ...


regards,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combine non-recursive and recursive CTEs?

2012-06-16 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 Basically, I'd like to combine a recursive and a non-recursive CTE in
 the same query.

Just mark them all as recursive.  There's no harm in marking a CTE as
recursive when it isn't really.

 Trying something like:
 WITH t1 (z,b) AS (
SELECT a,b FROM x
 ),
 RECURSIVE t2(z,b) AS (
SELECT z,b FROM t1 WHERE b IS NULL
  UNION ALL
SELECT z,b FROM t2 INNER JOIN t1 ON t2.b=t1.z
 )

 I get a syntax error on the RECURSIVE.

The SQL spec says RECURSIVE can only appear immediately after WITH,
so it necessarily applies to all the CTEs in the WITH list.

The reason why it's like that is that RECURSIVE affects the visibility
rules for which CTEs can refer to which other ones.  I think the SQL
committee would have done better to keep the two concepts separate,
but they didn't ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pg default's verbosity?

2012-06-16 Thread Fabien COELHO


Hello pgdev,

(Second attempt)

I've conducted a statistical study about PostgreSQL use in OSS. One of the 
result is that quite a few projects have errors in their SQL setup scripts 
which lead to some statements to be ignored, typically somme ADD 
CONSTRAINTS which do not change the database schema from a functional 
point of view, or syntactic errors (typically a mysql syntax...) that
result in missing tables, but which are not found if the application is 
not fully tested.


I think that there are two reasons why these errors are not caught by 
application developers:


(1) the default verbosity is set to notice, which is much to high. The 
users just get used to seeing a lot of messages on loading an sql script, 
and to ignore them, so that errors are just hidden in the flow of notices. 
I think that a better default setting would be warnings, that is 
messages that require some attention from the developer.


(2) the default behavior of psql on errors is to keep going. Developers of 
SQL script that are expected to work shoud be advised to:

 - encourage application devs to set ON_ERROR_STOP and/or use a global
   transaction in their script.
 - provide a simple/short option to do that from the command line
   basically that could be an enhanced -1, NOT restricted
   to -f but that would work on standard input as well.

   sh psql -1 -f setup.sql # -1 does work here
   sh psql -1  setup.sql # -1 does not apply to stdin stuff...


So I would suggest the following todos:

1 - change the default verbosity to warning.

2 - change -1 to work on stdin as well instead of being ignored,
or provide another option that would do that.

--
Fabien Coelho - coe...@cri.ensmp.fr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] compare lower case and upper case when encoding is utf-8

2012-06-16 Thread Quan Zongliang

Hi hackers,

I found that lower case is less than upper case when the db is created 
with utf8.

I tried below
locale   en_US.utf8   'A''a' false
locale   ja_JP.utf8   'A''a' true
locale   zh_CN.utf8   'A''a' false
Under Windows
locale   Chinese_China   'A''a' false

I am not sure it is normal or not.
But in Chinese, the lower case should be greater than upper, same as 
locale C.


I made some code try to fix it.
It seems to work fine.

Quan Zongliang


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Combine non-recursive and recursive CTEs?

2012-06-16 Thread Magnus Hagander
On Sat, Jun 16, 2012 at 2:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 Basically, I'd like to combine a recursive and a non-recursive CTE in
 the same query.

 Just mark them all as recursive.  There's no harm in marking a CTE as
 recursive when it isn't really.

Hah. I could've sworn I tried that and got the typical error of you
need to use the union construct for recursive queries. But clearly I
must've typoed something in that one, because when I did that over
again, it now worked perfectly...

Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-16 Thread Andres Freund
Hi Robert,

On Friday, June 15, 2012 10:03:38 PM Robert Haas wrote:
 On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  I don't plan to throw in loads of conflict resolution smarts. The aim is
  to get to the place where all the infrastructure is there so that a MM
  solution can be built by basically plugging in a conflict resolution
  mechanism. Maybe providing a very simple one.
  I think without in-core support its really, really hard to build a
  sensible MM implementation. Which doesn't mean it has to live entirely
  in core.
 
 Of course, several people have already done it, perhaps most notably
 Bucardo.
Bucardo certainly is nice but its not useable for many things just from an 
overhead perspective.

 Anyway, it would be good to get opinions from more people here.  I am
 sure I am not the only person with an opinion on the appropriateness
 of trying to build a multi-master replication solution in core or,
 indeed, the only person with an opinion on any of these other issues.
 It is not good for those other opinions to be saved for a later date.
Agreed.

  Hm. Yes, you could do that. But I have to say I don't really see a point.
  Maybe the fact that I do envision multimaster systems at some point is
  clouding my judgement though as its far less easy in that case.
 Why?  I don't think that particularly changes anything.
Because it makes conflict detection very hard. I also don't think its a 
feature worth supporting. Whats the use-case of updating records you cannot 
properly identify?

  It also complicates the wal format as you now need to specify whether you
  transport a full or a primary-key only tuple...
 Why?  If the schemas are in sync, the target knows what the PK is
 perfectly well.  If not, you're probably in trouble anyway.
True. There already was the wish (from Kevin) of having the option of 
transporting full before/after images anyway, so the wal format might want to 
be able to represent that.

  I think though that we do not want to enforce that mode of operation for
  tightly coupled instances. For those I was thinking of using command
  triggers to synchronize the catalogs.
  One of the big screwups of the current replication solutions is exactly
  that you cannot sensibly do DDL which is not a big problem if you have a
  huge system with loads of different databases and very knowledgeable
  people et al. but at the beginning it really sucks. I have no problem
  with making one of the nodes the schema master in that case.
  Also I would like to avoid the overhead of the proxy instance for
  use-cases where you really want one node replicated as fully as possible
  with the slight exception of being able to have summing tables,
  different indexes et al.
 In my view, a logical replication solution is precisely one in which
 the catalogs don't need to be in sync.  If the catalogs have to be in
 sync, it's not logical replication.  ISTM that what you're talking
 about is sort of a hybrid between physical replication (pages) and
 logical replication (tuples) - you want to ship around raw binary
 tuple data, but not entire pages.
Ok, thats a valid point. Simon argued at the cluster summit that everything 
thats not physical is logical. Which has some appeal because it seems hard to 
agree what exactly logical rep is. So definition by exclusion makes kind of 
sense ;)

I think what you categorized as hybrid logical/physical rep solves an 
important use-case thats very hard to solve at the moment. Before my 
2ndquadrant days I had several client which had huge problemsing the trigger 
based solutions because their overhead simply was to big a burden on the 
master. They couldn't use SR either because every consuming database kept 
loads of local data.
I think such scenarios are getting more and more common.

 The problem with that is it's going to be tough to make robust.  Users could
 easily end up with answers that are total nonsense, or probably even crash
 the server.
Why?

 To step back and talk about DDL more generally, you've mentioned a few
 times the idea of using an SR instance that has been filtered down to
 just the system catalogs as a means of generating logical change
 records.  However, as things stand today, there's no reason to suppose
 that replicating anything less than the entire cluster is sufficient.
 For example, you can't translate enum labels to strings without access
 to the pg_enum catalog, which would be there, because enums are
 built-in types.  But someone could supply a similar user-defined type
 that uses a user-defined table to do those lookups, and now you've got
 a problem.  I think this is a contractual problem, not a technical
 one.  From the point of view of logical replication, it would be nice
 if type output functions were basically guaranteed to look at nothing
 but the datum they get passed as an argument, or at the very least
 nothing other than the system catalogs, but there is no such
 guarantee.  And, without 

[HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed
On 12 February 2012 02:06, Vik Reykja vikrey...@gmail.com wrote:
 I decided to take a crack at the todo item created from the following post:
 http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php

 The attached patch makes the desired changes in both code and function
 naming.

 It seemed quite easy to do but wasn't marked as easy on the todo, so I'm
 wondering if I've missed something.  All regression tests pass.


Here's my review of this patch.


Basic stuff:


* Patch applies OK (some offsets).

BTW, I had no problems applying both the original patch and Chetan
Suttraway's version. The only difference between the patches seems to
be that the original is in context format, and Chetan Suttraway's is
in unified format.

Which format do hackers actually prefer? The wiki page
http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
suggests context format, but then the linked example
http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
format. Do people care, or are both formats OK?

* Compiles cleanly with no warnings.

* Regression tests pass.

The regression tests have not been updated. I think that's fair enough
- I don't see a way to test this in a regression test - but I did some
testing (see below) to confirm the expected behaviour.

* No doc changes needed.


What it does:
-

The primary benefit this patch offers is to prevent unnecessary
queuing of RI triggers in the case of updates to a FK table where the
old and new FK values are both NULL. It does this by effectively
replacing the existing key equality checks with IS [NOT] DISTINCT FROM
checks.

This seems like a worthwhile optimisation, because I think that it is
fairly common to have NULLs in FKs columns, and then update some other
column.

The patch also prevents unnecessary queuing of RI triggers when the PK
table is updated, and the old and new values are both NULL, but that
seems like a much less common case.

I've looked over the code changes fairly closely, and I believe that
this is a safe change.

Technically, I think the changes to ri_OneKeyEqual() and
ri_AllKeysUnequal() are unnecessary, since they can only be called
from the trigger functions in the case where all the old values are
non-NULL, hence the new versions end up behaving the same. However, I
think it makes sense for all these functions to be consistent.

Talking of consistency, I wonder if RI_FKey_keyequal_upd_pk() and
RI_FKey_keyequal_upd_fk() ought to be renamed to
RI_FKey_keyunchanged_upd_pk() and RI_FKey_keyunchanged_upd_pk()?


Testing:


I tested using the following tables:

CREATE TABLE pk_table
(
  a int PRIMARY KEY
);

INSERT INTO pk_table
  SELECT * FROM generate_series(1,10);

CREATE TABLE fk_table
(
  a int PRIMARY KEY,
  b int,
  c int,
  d int,
  e int REFERENCES pk_table(a)
);

INSERT INTO fk_table
  SELECT i, i*2, i*3, i*4, CASE WHEN i%10 = 0 THEN i END
FROM generate_series(1,10) g(i);

(i.e., FK populated in 10% of rows)


Then in HEAD:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1390.037..1390.037 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..60.841 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
 Total runtime: 1607.626 ms
(4 rows)

So the RI trigger is fired 9 times, for the unchanged NULL FK rows.

With this patch, the RI trigger is not fired at all:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1489.640..1489.640 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..66.328 rows=10 loops=1)
 Total runtime: 1489.679 ms
(3 rows)


Similarly, if I update the FK column in HEAD the RI trigger is fired
for every row:
EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..1800.00 rows=10 width=26) (actual
time=1565.148..1565.148 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..1800.00 rows=10 width=26)
(actual time=0.010..42.725 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
 Total runtime: 2279.408 ms
(4 rows)

whereas with this patch it is only fired for the non-NULL FK rows that
are changing:
EXPLAIN ANALYSE 

[HACKERS] Re: [COMMITTERS] pgsql: Run pgindent on 9.2 source tree in preparation for first 9.3

2012-06-16 Thread Bruce Momjian
On Sat, Jun 16, 2012 at 01:10:31AM -0400, Noah Misch wrote:
 On Fri, Jun 15, 2012 at 10:45:16PM -0400, Bruce Momjian wrote:
  I have updated the pgindent README to use
  these Perl indent instructions:
  
  find . -name \*.pl -o -name \*.pm | xargs perltidy \
  --backup-and-modify-in-place --opening-brace-on-new-line \
  --vertical-tightness=2 --vertical-tightness-closing=2 \
  --nospace-after-keyword=for --nospace-for-semicolon \
  --add-whitespace --delete-old-whitespace --paren-tightness=2 \
  --keep-old-blank-lines=2 --maximum-line-length=78 \
  --entab-leading-whitespace=4 --output-line-ending=unix
 
 I would lean against using --nospace-after-keyword=for.  Not using it means we
 get wrong formatting when the for-loop conditions span multiple lines.  Using
 it means we get wrong formatting (albeit less severe) on every for-loop.  In
 any event, if we do use it for for-loops, we should probably use it for all
 control structure keywords.

Agreed, good point.

 Otherwise, I like this.
 
 As a last idle idea, how about putting the options in a configuration file and
 passing --profile= as the only option?  Besides keeping you from copying a
 7-line shell command, this has the benefit of ignoring any ~/.perltidyrc.

Also agreed, and change made.  Perltidyrc now has:

--add-whitespace
--backup-and-modify-in-place
--delete-old-whitespace
--entab-leading-whitespace=4
--keep-old-blank-lines=2
--maximum-line-length=78
--nospace-for-semicolon
--opening-brace-on-new-line
--output-line-ending=unix
--paren-tightness=2
--vertical-tightness=2
--vertical-tightness-closing=2

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Marko Kreen
Demos:

https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c
https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c

Few clarifications below.

On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen mark...@gmail.com wrote:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.

 Such API is implemented in attached patch:

 * PQsetSingleRowMode(conn): set's single-row mode.

The function can be called only after PQsend* and before any
rows have arrived.  This guarantees there will be no surprises
to PQexec* users who expect full resultset at once.  Also it
guarantees that user will process resultset with PQgetResult()
loop, either sync or async.  Next PQexec/PQsend call will
reset the flag.  So it is active only for duration of processing
results from one command.

Currently it returns FALSE if called in wrong place and does
nothing.  Only question I see here is whether it should set
error state on connection or not.  It does not seem to be
improvement.

 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.

 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

The PQgetResult() is compatible with callbacks, the PQgetRowData()
bypasses them.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 Now, looking at the problem with some perspective, the solution
 is obvious: when in single-row mode, the PQgetResult() must return
 proper PGresult for that single row.  And everything else follows that.
 
 * PQgetRowData(): can be called instead PQgetResult() to get raw row data
  in buffer, for more efficient processing.  This is optional feature
  that provides the original row-callback promise of avoiding unnecessary
  row data copy.
 
 * Although PQgetRowData() makes callback API unnecessary, it is still
  fully compatible with it - the callback should not see any difference
  whether the resultset is processed in single-row mode or
  old single-PGresult mode.  Unless it wants to - it can check
  PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach.  IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

What I like about this is the greatly simpler and harder-to-misuse
API.  The only arguable drawback is that there's still at least one
malloc/free cycle per tuple, imposed by the creation of a PGresult
for each one, whereas the callback approach avoids that.  But worrying
about that could be considered to be vast overoptimization; the backend
has certainly spent a lot more overhead than that generating the tuple.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-16 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's not obvious to me that we actually *need* anything except the
 ability to recognize that a null-encrypted SSL connection probably
 shouldn't be treated as matching a hostssl line; which is not something
 that requires any fundamental rearrangements, since it only requires an
 after-the-fact check of what was selected.

 Maybe I spelled it out wrong. It does require it insofar that if we
 want to use this for compression, we must *always* enable openssl on
 the connection. So the with these encryption method boils down to
 NULL encryption only or whatever other standards I have for
 encryption. We don't need the ability to change the whatever other
 standards per subnet, but we need to control the
 accept-NULL-encryption on a per subnet basis.

After sleeping on it, I wonder if we couldn't redefine the existing
list of acceptable ciphers option as the list of ciphers that are
considered to provide encrypted transport.  So you'd be allowed to
connect with SSL using any unapproved cipher (including NULL), the
backend just considers it as equivalent to a non-SSL connection for
pg_hba purposes.  Then no change is needed in any configuration stuff.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-16 Thread Tom Lane
Fabien COELHO coe...@cri.ensmp.fr writes:
 [ errors in SQL scripts fed to psql are easily missed ]

 So I would suggest the following todos:
 1 - change the default verbosity to warning.

The argument for defaulting to NOTICE is the same as it's always been:
that those messages are really intended for novices, and a pretty good
definition of a novice is somebody who doesn't know how to (or that he
should) change the verbosity setting.  So if we don't show notices by
default, they will be unavailable to exactly the people who need them.
Your proposal does not overcome this argument.

Besides, I'm not convinced that changing client_min_messages in
isolation would do much for the problem, because psql is still pretty
chatty by itself; you really need -q to have any hope that important
messages didn't scroll off your screen.  Perhaps it would be sensible to
have the -q switch also execute set client_min_messages = warning, and
recommend that people use that when running allegedly-debugged scripts?

 2 - change -1 to work on stdin as well instead of being ignored,
  or provide another option that would do that.

Yeah, if that doesn't work already, it would be sane to make it do so,
at least for non-tty stdin.  It seems like a fairly bad idea for
interactive stdin, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup docs

2012-06-16 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 This is copied from the old documentation. It used to say It is not
 necessary to be concerned about the amount of time elapsed between
 pg_start_backup and the start of the actual backup, nor between the
 end of the backup and pg_stop_backup.

 And the whole idea was to simplify the text at the beginning ;)

Oh I see, not your patch to fix then. I just quick read the diff, as you
can see.

 This is copied exactly from what it is today. I'm sure it can be
 approved, but it's not the goal of this patch. Let's not let
 perfection get in the way of improvement...

Same.

 It does, it's under standalone hot backups. The second to last part
 of the patch.

Perfect then.

Sorry for the noise, regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-16 Thread k...@rice.edu
On Sat, Jun 16, 2012 at 11:15:30AM -0400, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Sat, Jun 16, 2012 at 12:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  It's not obvious to me that we actually *need* anything except the
  ability to recognize that a null-encrypted SSL connection probably
  shouldn't be treated as matching a hostssl line; which is not something
  that requires any fundamental rearrangements, since it only requires an
  after-the-fact check of what was selected.
 
  Maybe I spelled it out wrong. It does require it insofar that if we
  want to use this for compression, we must *always* enable openssl on
  the connection. So the with these encryption method boils down to
  NULL encryption only or whatever other standards I have for
  encryption. We don't need the ability to change the whatever other
  standards per subnet, but we need to control the
  accept-NULL-encryption on a per subnet basis.
 
 After sleeping on it, I wonder if we couldn't redefine the existing
 list of acceptable ciphers option as the list of ciphers that are
 considered to provide encrypted transport.  So you'd be allowed to
 connect with SSL using any unapproved cipher (including NULL), the
 backend just considers it as equivalent to a non-SSL connection for
 pg_hba purposes.  Then no change is needed in any configuration stuff.
 
   regards, tom lane
 

+1 That is nice and clean.

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] libpq one-row-at-a-time API

2012-06-16 Thread Marko Kreen
On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I guess this raises the question of whether we ought to revert the
 row-callback patch entirely and support only this approach.  IMO
 it is (barely) not too late to do that for 9.2, if we want to.
 If we don't want to, then this is just another new feature and
 should be considered for 9.3.

I think row-callback is dangerous API that does not solve any
important problems.

But I do like the 2-phase processing the rowproc patch introduced
and having a way to bypass unnecessary malloc()+copy.

So my preference would be to simply remove the callback API
but keep the processing and provide PQgetRowData() instead.

Although the win that it brings is significantly smaller thanks
to single-row PQgetResult().  So if it does not sound interesting
to others, it can be dropped.  Because the single-row processing
is the important feature we need, rest is extra.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 BTW, I had no problems applying both the original patch and Chetan
 Suttraway's version. The only difference between the patches seems to
 be that the original is in context format, and Chetan Suttraway's is
 in unified format.

 Which format do hackers actually prefer? The wiki page
 http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
 suggests context format, but then the linked example
 http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
 format. Do people care, or are both formats OK?

Some people find one or the other more readable.  (I'm in the camp that
says unified format is great for isolated single-line changes and
utterly unreadable for anything more complex, but apparently there are
people who prefer it.)

For detailed review/commit purposes, it doesn't matter that much as long
as the patch applies cleanly, since it's easy to apply it and then get
a diff in the other format if you prefer reading the other.  However,
if you're just hoping people will eyeball the patch in email and comment
on it, readability matters.  If the patch requires manual fixup in order
to get it to apply anymore, readability is also a concern, since you're
dependent on the committer not misinterpreting the hunks he has to patch
in by hand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed
On 16 June 2012 18:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 BTW, I had no problems applying both the original patch and Chetan
 Suttraway's version. The only difference between the patches seems to
 be that the original is in context format, and Chetan Suttraway's is
 in unified format.

 Which format do hackers actually prefer? The wiki page
 http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
 suggests context format, but then the linked example
 http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
 format. Do people care, or are both formats OK?

 Some people find one or the other more readable.  (I'm in the camp that
 says unified format is great for isolated single-line changes and
 utterly unreadable for anything more complex, but apparently there are
 people who prefer it.)

 For detailed review/commit purposes, it doesn't matter that much as long
 as the patch applies cleanly, since it's easy to apply it and then get
 a diff in the other format if you prefer reading the other.  However,
 if you're just hoping people will eyeball the patch in email and comment
 on it, readability matters.  If the patch requires manual fixup in order
 to get it to apply anymore, readability is also a concern, since you're
 dependent on the committer not misinterpreting the hunks he has to patch
 in by hand.


OK thanks, that's good to know.
I tend to find context format easier to read for large patches, but
that's a highly subjective thing.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Start of 2012-06 CommitFest

2012-06-16 Thread Kevin Grittner
There hasn't been any discussion of who will manage this CF that I've
heard.  Since nature abhors a vacuum, I took some preliminary steps
to get it started, and will take the role if nobody objects.
 
If you want to review a particular patch, go ahead and claim it in
the CF application:
 
https://commitfest.postgresql.org/action/commitfest_view?id=14
 
If you're not sure which patch you want to review, please send me an
email off-list with your areas of interest and a summary of your
skill-set, so I can pick one for you.
 
There was a decision at the developers' meeting to ask each patch
submitter to review patches of the same number and approximate
complexity as they submit.  If you haven't yet done so, please do.
(Remember, it's fine to have multiple reviewers for a single patch.)
 
If you want to contribute to the development of PostgreSQL and you
haven't yet reviewed any patches, please read this page:
 
http://wiki.postgresql.org/wiki/CommitFest
 
... and follow the appropriate links for more detail.  You don't need
to be a C coder to help.  We need people to test, benchmark, and
check documentation, too.
 
This CF is scheduled to run from the 15th of June to the 15th of
July, so any new patches should be submitted to the next CF:
 
http://commitfest.postgresql.org/action/commitfest_view/open
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?

2012-06-16 Thread Tom Lane
While looking at Vik Reykja's pending patch to improve the FK triggers
by skipping processing when a NULL column didn't change, I started to
wonder whether that really had no user-visible semantic effect.
In particular, in ON UPDATE SET NULL/SET DEFAULT cases, it seemed like
this could change the set of referencing columns that get set to NULL
or to their defaults.  So the next question was which behavior actually
conforms to the SQL standard, and the answer to that is ... disturbing.

The code in ri_triggers.c was written against SQL92's definition of ON
UPDATE SET NULL/SET DEFAULT, which is (ignoring the MATCH PARTIAL case,
which we don't implement):

 6) If an update rule is specified and a non-null value of a ref-
erenced column in the referenced table is updated to a value
that is distinct from the current value of that column, then

a) If match type is not specified or if FULL is specified,
  then

 ii) If the update rule specifies SET NULL, then

 Case:

 1) If match type is not specified, then in all matching
   rows the referencing column that corresponds with the
   referenced column is set to the null value.

 2) If match type specifies FULL, then in all matching
   rows each referencing column is set to the null value.

iii) If the update rule specifies SET DEFAULT, then in all
 matching rows the referencing column that corresponds with
 the referenced column is set to the default value specified
 in the General Rules of Subclause 11.5, default clause.

Note that only in the MATCH FULL + SET NULL case does it say to set
*all* the referencing columns in each matching row.  Otherwise, you are
only supposed to change columns that correspond to referenced columns
that were changed.  It's notable that SET NULL and SET DEFAULT have
different behaviors here.

On the other hand, in SQL:2008 I find (some boilerplate text omitted):

10) If a non-null value of a referenced column RC in the
referenced table is updated to a value that is distinct from the
current value of RC, then, for every member F of the subtable
family of the referencing table:

Case:

  a) If M specifies SIMPLE or FULL, then

Case:

ii) If UR specifies SET NULL, then

  Case:

1) If M specifies SIMPLE, then each matching row
MR in F is paired with the candidate replacement
row NMR, formed by copying MR and setting each
referencing column in the copy to the null
value. MR is identified for replacement by NMR
in F.

2) If M specifies FULL, then each matching row
MR in F is paired with the candidate replacement
row NMR, formed by copying MR and setting each
referencing column in the copy to the null
value. MR is identified for replacement by NMR
in F.

iii) If UR specifies SET DEFAULT, then each matching row
MR in F is paired with the candidate replacement row
NMR, formed by copying MR and setting each referencing
column in the copy to the default value specified in the
General Rules of Subclause 11.5, default clause. MR
is identified for replacement by NMR in F.

So far as I can see, this says to set *all* referencing columns to
nulls or their defaults, in all four cases, whether the corresponding
referenced column was one that changed or not.  This is very clearly
different from what SQL92 says.  It's also rather curious that they
distinguish two cases for SET NULL when the texts are exactly alike.

It looks to me like this change occurred in SQL:2003, although
SQL:1999's version of the text is such badly written pseudo-mathematical
gobbledygook that it's a bit hard to tell which behavior they meant.
However, neither of those specs list any change in referential
constraint behavior as being an acknowledged incompatibility with the
prior standard.  Have the SQL committee simply failed to notice that in
whacking this text around they changed the meaning?  Which behavior is
actually implemented by other RDBMSes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-16 Thread Fabien COELHO


Hello Tom,

thanks for your answer.


So I would suggest the following todos:
1 - change the default verbosity to warning.


The argument for defaulting to NOTICE is the same as it's always been:
that those messages are really intended for novices, and a pretty good
definition of a novice is somebody who doesn't know how to (or that he
should) change the verbosity setting.  So if we don't show notices by
default, they will be unavailable to exactly the people who need them.
Your proposal does not overcome this argument.


I'm sceptical about what a real novice is expected to do about the 
incredible flow of useless information displayed when loading a 
significant script. For a start, s?he should be an incredibly fast 
reader:-)


For a non-novice it just hides what is important and should be seen.

However maybe it make senses in interactive mode, as you suggest, so 
possibly this should be the real trigger to change the level of messages.



Besides, I'm not convinced that changing client_min_messages in
isolation would do much for the problem,


I agree with you, but it is a necessary step somewhere...

because psql is still pretty chatty by itself; you really need -q to 
have any hope that important messages didn't scroll off your screen.


Hmmm, yes and no, in my opinion. CREATE XXX is a very short output, 
quite distinct from the output of a warning/error, which can be seen when 
messages are scrolled, even if the message cannot be read on the fly. I 
would know that something is not right.


Perhaps it would be sensible to have the -q switch also execute set 
client_min_messages = warning, and recommend that people use that when 
running allegedly-debugged scripts?


That could be useful. However I'm not sure that I would select -q when 
loading a big script, I'm happy to know that things are going on and I 
would like to know if the script is stuck somewhere.



2 - change -1 to work on stdin as well instead of being ignored,
 or provide another option that would do that.


Yeah, if that doesn't work already,


I did checked that it does not work with 9.1.3.

it would be sane to make it do so, at least for non-tty stdin.  It seems 
like a fairly bad idea for interactive stdin, though.


I agree that distinguishing interactive  non interactive stdin is 
reasonable.


So maybe the suggestion would be to distinguish 2 default settings
 - client_min_tty_messages = notice; # or some other name...
 - client_min_messages = warning; # or some other name...

Moreover:

 - -1 should work on stdin *when not interactive*
 - -1 should be clearly advised when loading scripts...
   not sure where it should be in the documentation...
 - I'm not sure about -q for this purpose, mostly because
   I would not use it by default

--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Streaming-only Remastering

2012-06-16 Thread Daniel Farina
On Fri, Jun 15, 2012 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 10 June 2012 19:47, Joshua Berkus j...@agliodbs.com wrote:

 So currently we have a major limitation in binary replication, where it is 
 not possible to remaster your system (that is, designate the most 
 caught-up standby as the new master) based on streaming replication only.  
 This is a major limitation because the requirement to copy physical logs 
 over scp (or similar methods), manage and expire them more than doubles the 
 administrative overhead of managing replication.  This becomes even more of 
 a problem if you're doing cascading replication.

 The major limitation was solved by repmgr close to 2 years ago now.
 So while you're correct that the patch to fix that assumed that
 archiving worked as well, it has been possible to operate happily
 without it.

Remastering is one of the biggest thorns in my side over the last
year.  I don't think it's yet a trivially mechanized issue yet, but I
do need to get there, and probably a few alterations in Postgres would
help, although I have not itemized what they are (rather, I was
intending to work around problems with what I have today).  But since
it is apropos to this discussion, here's what I've been thinking along
these lines:

Instead of using re-synchronization (e.g. repmgr in its relation to
rsync), I intend to proxy and also inspect the streaming replication
traffic and then quiesce all standbys and figure out what node is
farthest ahead.  Once I figure out the node that is farthest ahead, if
it is not a node that is eligible for promotion to the master, I need
to exchange its changes to nodes that are eligible for promotion[0],
and then promote one of those, repointing all other standbys to that
node. This must all take place nominally within a second or thirty.
Conceptually it is simple, but mechanically it's somewhat intense,
especially in relation to the inconvenience of doing this incorrectly.

I surmise someone could come up with supporting mechanisms to make it
less burdensome to write.

One snarl is the interaction with the archive and restore commands:
Postgres might, for example, have been in the middle of  download and
replaying a WAL segment even when I wish to be quiesced, and there's
not a great way to stop it[1].

Ideally, I could replace those archive/dearchive commands with
software that speaks the streaming replication protocol and just have
less code involved overall.  I think that is technically possible
today, but maybe could be made easier, in particular being able to
more easily chunk and align the WAL stream into units of some kind
from the streaming protocol.  Maybe it's already possible, but it will
take a little thinking.  I had already written off getting this level
of cohesion in the next year (intending a detailed mix of
archive_command and streaming protocol software), but it's not
something that leaves me close to satisfied by any measure.

Furthermore, some use cases demand that no matter what the user
setting with regard to syncrep is that Postgres not make progress
unless it has synchronously replicated to a special piece of proxy
software.  This is useful if one wants to offload the exact location
and storage strategy for crash recovery to another piece of software.
That's the obvious next step after a cohesive delegation of
(de-)archiving.

So, all in all, Postgres has no great way to cohesively delegate all
WAL-persistence and WAL-restoration and I don't know if the streaming
protocol + sync rep facilities can completely conveniently subsume all
those use cases (but I think it probably can without enormous
modification).  I think it should learn what it needs to learn to make
that happen.  It might even allow the existing shell-command based
(de-)archiver to live as a contrib.


[0]: Use case: When a small standby used for some reporting happens to
be the farthest ahead)

[1]: Details: a simple touched file to no-op the restore_command is
unsatisfying, because the restore_command may have already been
started by postgres, so now you have to make your restore_command
coordinate with your streaming replication proxy software to be safe
or wait long enough for a single segment to replay as so one can be
assured that the system is quiesced.  I see this is an anti-feature of
the current file-based archiving strategy)

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-16 Thread Steve Singer

On 12-06-15 04:03 PM, Robert Haas wrote:

On Thu, Jun 14, 2012 at 4:13 PM, Andres Freundand...@2ndquadrant.com  wrote:

I don't plan to throw in loads of conflict resolution smarts. The aim is to get
to the place where all the infrastructure is there so that a MM solution can
be built by basically plugging in a conflict resolution mechanism. Maybe
providing a very simple one.
I think without in-core support its really, really hard to build a sensible MM
implementation. Which doesn't mean it has to live entirely in core.

Of course, several people have already done it, perhaps most notably Bucardo.

Anyway, it would be good to get opinions from more people here.  I am
sure I am not the only person with an opinion on the appropriateness
of trying to build a multi-master replication solution in core or,
indeed, the only person with an opinion on any of these other issues.


This sounds like a good place for me to chime in.

I feel that in-core support to capture changes and turn them into change 
records that can be replayed on other databases, without relying on 
triggers and log tables, would be good to have.


I think we want some flexible enough that people write consumers of the 
LCRs to do conflict resolution for multi-master but I am not sure that 
the conflict resolution support actually belongs in core.


Most of the complexity of slony (both in terms of lines of code, and 
issues people encounter using it) comes not from the log triggers or 
replay of the logged data but comes from the configuration of the cluster.

Controlling things like

* Which tables replicate from a node to which other nodes
* How do you change the cluster configuration on a running system 
(adding nodes, removing nodes, moving the origin of a table, adding 
tables to replication etc...)


This is the harder part of the problem, I think we need to first get the 
infrastructure committed (that the current patch set deals with) to 
capturing, transporting and translating the LCR's into the system before 
get too caught up in the configuration aspects.   I think we will have a 
hard time agreeing on behaviours for some of that other stuff that are 
both flexible for enough use cases and simple enough for 
administrators.  I'd like to see in-core support for a lot of that stuff 
but I'm not holding my breath.



It is not good for those other opinions to be saved for a later date.


Hm. Yes, you could do that. But I have to say I don't really see a point.
Maybe the fact that I do envision multimaster systems at some point is
clouding my judgement though as its far less easy in that case.

Why?  I don't think that particularly changes anything.


It also complicates the wal format as you now need to specify whether you
transport a full or a primary-key only tuple...

Why?  If the schemas are in sync, the target knows what the PK is
perfectly well.  If not, you're probably in trouble anyway.





I think though that we do not want to enforce that mode of operation for
tightly coupled instances. For those I was thinking of using command triggers
to synchronize the catalogs.
One of the big screwups of the current replication solutions is exactly that
you cannot sensibly do DDL which is not a big problem if you have a huge
system with loads of different databases and very knowledgeable people et al.
but at the beginning it really sucks. I have no problem with making one of the
nodes the schema master in that case.
Also I would like to avoid the overhead of the proxy instance for use-cases
where you really want one node replicated as fully as possible with the slight
exception of being able to have summing tables, different indexes et al.

In my view, a logical replication solution is precisely one in which
the catalogs don't need to be in sync.  If the catalogs have to be in
sync, it's not logical replication.  ISTM that what you're talking
about is sort of a hybrid between physical replication (pages) and
logical replication (tuples) - you want to ship around raw binary
tuple data, but not entire pages.  The problem with that is it's going
to be tough to make robust.  Users could easily end up with answers
that are total nonsense, or probably even crash the server.



I see three catalogs in play here.
1. The catalog on the origin
2. The catalog on the proxy system (this is the catalog used to 
translate the WAL records to LCR's).  The proxy system will need 
essentially the same pgsql binaries (same architecture, important 
complie flags etc..) as the origin

3. The catalog on the destination system(s).

The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need 
to be in-sync with catalog 1.   I think catalogs 2 and 3 are combined in 
the current patch set (though I haven't yet looked at the code 
closely).   I think the performance optimizations Andres has implemented 
to update tuples through low-level functions should be left for later 
and that we should  be generating SQL in the apply cache so we don't 
start 

Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?

2012-06-16 Thread Tom Lane
I wrote:
 Have the SQL committee simply failed to notice that in
 whacking this text around they changed the meaning?  Which behavior is
 actually implemented by other RDBMSes?

If anyone is up for actually trying this, here is a script to test the
behavior in question:

create table pp (f1 int, f2 int, primary key (f1,f2));
create table cmssn (f1 int, f2 int,
foreign key (f1,f2) references pp(f1,f2) on update set null);
create table cmfsn (f1 int, f2 int,
foreign key (f1,f2) references pp(f1,f2) match full on update set null);
create table cmssd (f1 int default 0, f2 int default 0,
foreign key (f1,f2) references pp(f1,f2) on update set default);
create table cmfsd (f1 int default 0, f2 int default 0,
foreign key (f1,f2) references pp(f1,f2) match full on update set default);

insert into pp values (11, 22);
insert into pp values (11, 0);
insert into pp values (0, 0);

insert into cmssn values (11, 22);
insert into cmfsn values (11, 22);
insert into cmssd values (11, 22);
insert into cmfsd values (11, 22);

update pp set f2 = f2 + 1 where f2  0;

select * from cmssn;
select * from cmfsn;
select * from cmssd;
select * from cmfsd;

In Postgres this produces

 f1 | f2 
+
 11 |   
(1 row)

 f1 | f2 
+
|   
(1 row)

 f1 | f2 
+
 11 |  0
(1 row)

 f1 | f2 
+
  0 |  0
(1 row)

which shows that we are self-consistent but not actually compliant with
either old or new wordings of the spec :-(

The only other SQL DB I have handy is mysql 5.5.24, which shows up
pretty unimpressively: it gives a syntax error on the cmssd definition,
which would be all right because the manual says the innodb storage
engine doesn't support SET DEFAULT, except it *doesn't* give a syntax
error for creating cmfsd.  Then, the update fails claiming that cmfsn's
FK constraint is violated, so they evidently don't implement that case
correctly.  After removing cmfsn, the update fails again claiming that
cmfsd's FK constraint is violated, so yeah they are telling the truth
when they say SET DEFAULT doesn't work.  The upshot is that only the
MATCH SIMPLE SET NULL case works at all in current mysql, and that
produces the result

mysql select * from cmssn;
+--+--+
| f1   | f2   |
+--+--+
| NULL | NULL |
+--+--+
1 row in set (0.00 sec)

so they are nulling all the referencing columns in this case, which
matches the more recent specs but is clearly contrary to SQL92.

Anybody have DB2, or something else that might be thought to be pretty
close to spec-compliant?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?

2012-06-16 Thread Dean Rasheed
On 16 June 2012 21:18, Tom Lane t...@sss.pgh.pa.us wrote:
 Anybody have DB2, or something else that might be thought to be pretty
 close to spec-compliant?


I have an Oracle DB, but they're not exactly known for spec
compliance. In fact they dodge this entire issue by not supporting ON
UPDATE actions at all :-)

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] s/UNSPECIFIED/SIMPLE/ in foreign key code?

2012-06-16 Thread Tom Lane
Our foreign-key-related code uses MATCH_UNSPECIFIED to denote the
default foreign key match behavior.  This corresponds to the wording
used in the SQL92 spec, for instance If match type is not specified
or if FULL is specified,   But I always found it rather confusing;
it sounds like we don't know what match behavior we're supposed to
implement.

I notice that in SQL99 and later, the SQL committee introduced MATCH
SIMPLE as a way to name the behavior that formerly had no name.
So now they can write things like If M specifies SIMPLE or FULL, ...
which seems much nicer to me.

I think it would be a useful advance in readability if we replaced
UNSPECIFIED by SIMPLE throughout the FK code, and barring objections
I will go do that.

A small flaw in this plan is that in pg_constraint.confmatchtype,
MATCH_UNSPECIFIED is stored as 'u'.  In a green field I'd just rename
that to 's' for SIMPLE, but it seems possible that this would confuse
client-side code such as pg_dump or psql.  A quick look shows that
neither of those programs actually look directly at
pg_constraint.confmatchtype, instead relying on backend functions when
they want to deconstruct a foreign key constraint.  But there could well
be other client code that would notice the change.  So I'm a bit torn
as to whether to change it and create a release-note-worthy
compatibility issue, or to leave it as-is (with documentation notes that
u for MATCH_SIMPLE is a historical accident).

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] sortsupport for text

2012-06-16 Thread Peter Geoghegan
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
 However, it occurred to me that we could pretty easily jury-rig
 something that would give us an idea about the actual benefit available
 here.  To wit: make a C function that wraps strxfrm, basically
 strxfrm(text) returns bytea.  Then compare the performance of
 ORDER BY text_col to ORDER BY strxfrm(text_col).

 (You would need to have either both or neither of text and bytea
 using the sortsupport code paths for this to be a fair comparison.)

I thought this was an interesting idea, so decided to try it out for
myself. I tried this out against master (not Robert's patch, per Tom's
direction). The results were interesting:

[peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2795
tps = 46.563970 (including connections establishing)
tps = 46.568234 (excluding connections establishing)
[peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 2079
tps = 34.638838 (including connections establishing)
tps = 34.640665 (excluding connections establishing)

The first test executed the following query against the dellstore database:

select * from products order by strxfrm_test(actor) offset 10001;

The second:

select * from products order by actor offset 10001;

So, this was pretty good - an improvement that is completely
independent of Robert's. Bear in mind, this simple demonstration adds
additional fmgr overhead, which we have plenty of reason to believe
could hurt things, besides which each call must allocate memory that
could perhaps be avoided. In addition, I don't know enough about
locale-aware sorting and related algorithms to have devised a test
that would stress strxfrm()/ strcoll() - these were all strings that
could be represented as ASCII.

In light of this, I think there is a pretty strong case to be made for
pre-processing text via strxfrm() as part of this patch.

Thoughts?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


strxfrm_test.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] measuring spinning

2012-06-16 Thread Jeff Janes
On Thu, Jun 14, 2012 at 2:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jan 11, 2012 at 8:48 PM, Robert Haas robertmh...@gmail.com wrote:
 I've had cause, a few times this development cycle, to want to measure
 the amount of spinning on each lwlock in the system.  To that end,
 I've found the attached patch useful.  Note that if you don't define
 LWLOCK_STATS, this changes nothing except that the return value from
 s_lock becomes int rather than void.  If you do define LWLOCK_STATS,
 then LWLockAcquire() counts the number of pg_usleep() calls that are
 required to acquire each LWLock, in addition to the other statistics.
 Since this has come up for me a few times now, I'd like to proposing
 including it in core.

 Well, this fell through the cracks, because I forgot to add it to the
 January CommitFest.  Here it is again, rebased.

This applies and builds cleanly and passes make check (under enable-cassert).

Not test or docs are needed for a patch of this nature.

It does what it says, and we want it.

I wondered if the change in the return signature of s_lock would have
an affect on performance.  So I've run a series of pgbench -T 30 -P
-c8 -j8, at a scale of 30 which fits in shared_buffers, using an
Amazon c1.xlarge
(8 cores).  I ran both HEAD, and HEAD+patch (without LWLOCK_STATS in
both cases), in random ordering.  The patch was 0.37% slower, average
298483 selects per second patched to 299582 HEAD.  The difference is
probably real (p value 0.042, one sided.) but is also pretty much
negligible and could just be due to where the executable code falls in
the cache lines which could move around with other changes to the
code.

Two suggestions:

In your original email you say number of pg_usleep() calls that are
required to acquire each LWLock, but nothing in the code says this.
Just reading lwlock.c I would naively assume it is reporting the
number of TAS spins, not the number of spin-delays (and in fact that
is what I did assume until I read your email more carefully).  A
comment somewhere in lwlock.c would be helpful.

Also in lwlock.c,

if (sh_acquire_counts[i] || ex_acquire_counts[i] ||
block_counts[i] || spin_counts[i])


I don't think we can have spins (or blocks, for that matter) unless we
have some acquires to have caused them, so the last two tests in that
line seem to be noise.

Since my suggestions are minor, should I go ahead and mark this ready
for committer?

Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?

2012-06-16 Thread Dickson S. Guedes
2012/6/16 Tom Lane t...@sss.pgh.pa.us:
 I wrote:
 Have the SQL committee simply failed to notice that in
 whacking this text around they changed the meaning?  Which behavior is
 actually implemented by other RDBMSes?

 If anyone is up for actually trying this, here is a script to test the
 behavior in question:

 create table pp (f1 int, f2 int, primary key (f1,f2));
 create table cmssn (f1 int, f2 int,
    foreign key (f1,f2) references pp(f1,f2) on update set null);
 create table cmfsn (f1 int, f2 int,
    foreign key (f1,f2) references pp(f1,f2) match full on update set null);
 create table cmssd (f1 int default 0, f2 int default 0,
    foreign key (f1,f2) references pp(f1,f2) on update set default);
 create table cmfsd (f1 int default 0, f2 int default 0,
    foreign key (f1,f2) references pp(f1,f2) match full on update set default);

 insert into pp values (11, 22);
 insert into pp values (11, 0);
 insert into pp values (0, 0);

 insert into cmssn values (11, 22);
 insert into cmfsn values (11, 22);
 insert into cmssd values (11, 22);
 insert into cmfsd values (11, 22);

 update pp set f2 = f2 + 1 where f2  0;

 select * from cmssn;
 select * from cmfsn;
 select * from cmssd;
 select * from cmfsd;

 In Postgres this produces

  f1 | f2
 +
  11 |
 (1 row)

  f1 | f2
 +
    |
 (1 row)

  f1 | f2
 +
  11 |  0
 (1 row)

  f1 | f2
 +
  0 |  0
 (1 row)

 which shows that we are self-consistent but not actually compliant with
 either old or new wordings of the spec :-(

 The only other SQL DB I have handy is mysql 5.5.24, which shows up
 pretty unimpressively: it gives a syntax error on the cmssd definition,
 which would be all right because the manual says the innodb storage
 engine doesn't support SET DEFAULT, except it *doesn't* give a syntax
 error for creating cmfsd.  Then, the update fails claiming that cmfsn's
 FK constraint is violated, so they evidently don't implement that case
 correctly.  After removing cmfsn, the update fails again claiming that
 cmfsd's FK constraint is violated, so yeah they are telling the truth
 when they say SET DEFAULT doesn't work.  The upshot is that only the
 MATCH SIMPLE SET NULL case works at all in current mysql, and that
 produces the result

 mysql select * from cmssn;
 +--+--+
 | f1   | f2   |
 +--+--+
 | NULL | NULL |
 +--+--+
 1 row in set (0.00 sec)

 so they are nulling all the referencing columns in this case, which
 matches the more recent specs but is clearly contrary to SQL92.

 Anybody have DB2, or something else that might be thought to be pretty
 close to spec-compliant?


I tryed in a MS SQL Server 2012 via SQLFiddle [1]. I could only create
'cmssn' and 'cmssd' tables because as I can see in [2] MS SQL Server
2012 doesn't supports MATCH syntax.

The result was:

select * from cmssn;
F1 |  F2
(null)   |  (null)

select * from cmssd;
F1 |F2
0   |   0

The test is in [3], and there you can try other RDBMS, just create the
schema on the left panel and testing selects on the right.

[1] http://sqlfiddle.com
[2] http://msdn.microsoft.com/en-us/library/ms174979.aspx
[3] http://sqlfiddle.com/#!6/ac7db/1

Regards.
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Broken system timekeeping breaks the stats collector

2012-06-16 Thread Tom Lane
I've had a vague feeling for awhile now that the occasional buildfarm
failures we see in the stats regression test (where it appears that the
stats collector fails to respond to requests for no good reason) might
be related to operating-system timekeeping glitches.  Today there is
finally indisputable evidence of this happening, in this log file:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=frogmouthdt=2012-06-16%2014%3A30%3A12
Observe the following log excerpt:

[4fdcaca3.e74:478] LOG:  statement: CREATE INDEX dupindexcols_i ON dupindexcols 
(f1, id, f1 text_pattern_ops);
[4fdcaca3.e74:479] LOG:  statement: ANALYZE dupindexcols;
[4fdcaca3.e74:480] LOG:  statement: EXPLAIN (COSTS OFF)
  SELECT count(*) FROM dupindexcols
WHERE f1  'WA' and id  1000 and f1 ~~ 'YX';
[4fdcaca3.e74:481] LOG:  statement: SELECT count(*) FROM dupindexcols
  WHERE f1  'WA' and id  1000 and f1 ~~ 'YX';
[4fdcaca3.e74:482] LOG:  disconnection: session time: 0:00:00.000 user=pgrunner 
database=regression host=::1 port=1123
[4fdcab40.e04:1] LOG:  connection received: host=::1 port=1125
[4fdcab40.e04:2] LOG:  connection authorized: user=pgrunner database=regression
[4fdcab40.e04:3] LOG:  statement: CREATE TABLE a (aa TEXT);
[4fdcab40.444:1] LOG:  connection received: host=::1 port=1126
[4fdcab40.e88:1] LOG:  connection received: host=::1 port=1127
[4fdcab40.444:2] LOG:  connection authorized: user=pgrunner database=regression
[4fdcab40.444:3] LOG:  statement: CREATE USER regtest_unpriv_user;
[4fdcab40.444:4] LOG:  statement: CREATE SCHEMA temp_func_test;
[4fdcab40.e04:4] LOG:  statement: CREATE TABLE b (bb TEXT) INHERITS (a);

We can tell from the statements being executed that session 4fdcaca3.e74
is running the create_index regression test, while 4fdcab40.444 is
running create_function_3, and the other sessions starting concurrently
with it are part of the parallel group that runs after create_index.
So they certainly didn't start until 4fdcaca3.e74 finished.  Also
create_index is not exactly a zero-time test, so the fact that
4fdcaca3.e74 reported a session runtime of 0:00:00.000 should already
set off some alarm bells.  But remember that in the %c log_line_prefix
escape, the first part is the process's start-time timestamp in hex.
So 4fdcaca3 means Sat Jun 16 2012, 11:56:19 EDT while 4fdcab40 means
Sat Jun 16 2012, 11:50:24 EDT, nearly six minutes *earlier*.

The reported zero session time is explainable by the fact that
TimestampDifference returns zeros if the given timestamps are out of
order.  All the other process timestamps in the log are likewise
consistent with the theory that the system's clock went backwards
six-plus minutes while create_index was running.

Then, when we finally get to the stats regression test, it fails with
symptoms indicating that the stats collector never answered the
backend's requests for an updated stats file; and when the postmaster
finally shuts down and tells the stats collector to shut down too, the
final pgstat_write_statsfile call bleats like so:
[4fdcac54.a0c:1] LOG:  last_statrequest 2012-06-16 11:55:20.813625-04 is later 
than collector's time 2012-06-16 11:54:12.392375-04

Once you know that the system clock glitched like that, it's fairly
obvious what happened inside the stats collector: it wrote out the stats
file (probably in response to an autovacuum request) sometime shortly
after 11:55:20, and then the system clock went backwards to around
11:50, and so all subsequent inquiry messages had request timestamps
older than last_statwrite causing the collector to believe it needn't
emit fresh stats files.

Aside from breaking the stats regression test, a scenario like this one
has bad implications for autovacuum: for something like six minutes,
autovacuum would have been operating with stale pgstat data since the
stats file would not get rewritten, and backends would see the file
timestamp as being ahead of their own clocks so they wouldn't think it
was stale.  (So this scenario doesn't explain the occasional pgstat
wait timeout failures we see; though perhaps someone can think of a
variant that fits that symptom?)

I will refrain from disparaging Windows here and just note that this
would be an unsurprising occurrence on any machine not running NTP: the
owner might occasionally fix the clock once it had drifted far enough
from reality.  So I think it might be advisable to install some defenses
against the case rather than assuming it can't happen.

The most direct fix would involve checking GetCurrentTimestamp against
last_statswrite in the stats collector's main loop, but on a machine
with slow gettimeofday calls that would be pretty bad for performance.
What I suggest instead is:

(1) In backend_read_statsfile, make an initial attempt to read the stats
file and then read GetCurrentTimestamp after that.  If the local clock
reading is less than the stats file's timestamp, we know that some sort
of clock skew or glitch has happened, so force an inquiry 

Re: [HACKERS] transforms

2012-06-16 Thread Jeff Janes
On Thu, Jun 14, 2012 at 3:42 PM, Peter Eisentraut pete...@gmx.net wrote:
 Here is my first patch for the transforms feature.  This is a mechanism
 to adapt data types to procedural languages.  The previous proposal was
 here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php

When I apply this and go to contrib and do make check, I get:

In file included from hstore_plperl.c:4:0:
../../src/pl/plperl/plperl.h:49:20: fatal error: EXTERN.h: No such
file or directory

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transforms

2012-06-16 Thread Jeff Janes
On Sat, Jun 16, 2012 at 7:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 3:42 PM, Peter Eisentraut pete...@gmx.net wrote:
 Here is my first patch for the transforms feature.  This is a mechanism
 to adapt data types to procedural languages.  The previous proposal was
 here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php

 When I apply this and go to contrib and do make check, I get:

 In file included from hstore_plperl.c:4:0:
 ../../src/pl/plperl/plperl.h:49:20: fatal error: EXTERN.h: No such
 file or directory

Ah, that went away when I remembered to ./configure --with-perl

Although the error message seem less than optimal.  Aren't test
usually skipped when they are missing prerequisites in the config?

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-16 Thread Dickson S. Guedes
2012/6/16 Tom Lane t...@sss.pgh.pa.us:
[... cut ...]
 (1) In backend_read_statsfile, make an initial attempt to read the stats
 file and then read GetCurrentTimestamp after that.  If the local clock
 reading is less than the stats file's timestamp, we know that some sort
 of clock skew or glitch has happened, so force an inquiry message to be
 sent with the local timestamp.  But then accept the stats file anyway,
 since the skew might be small and harmless.  The reason for the forced
 inquiry message is to cause (2) to happen at the collector.

 (2) In pgstat_recv_inquiry, if the received inquiry_time is older than
 last_statwrite, we should suspect a clock glitch (though it might just
 indicate delayed message receipt).  In this case, do a fresh
 GetCurrentTimestamp call, and if the reading is less than
 last_statwrite, we know that the collector's time went backwards.
 To recover, reset these variables as we do at startup:
        last_statrequest = GetCurrentTimestamp();
        last_statwrite = last_statrequest - 1;
 to force an immediate write to happen with the new local time.

 (1) is basically free in terms of the amount of work done in non-broken
 cases, though it will require a few more lines of code.  (2) means
 adding some GetCurrentTimestamp calls that did not occur before, but
 hopefully these will be infrequent, since in the absence of clock
 glitches they would only happen when a backend's demand for a new stats
 file is generated before the collector starts to write a new stats file
 but not received till afterwards.

 Comments?  Anyone see a flaw in this design?  Or want to argue that
 we shouldn't do anything about such cases?

What happens when Daylight saving time ends? Or it doesn't matter in
this scenario?

regards
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken system timekeeping breaks the stats collector

2012-06-16 Thread Tom Lane
Dickson S. Guedes lis...@guedesoft.net writes:
 What happens when Daylight saving time ends? Or it doesn't matter in
 this scenario?

Irrelevant, we're working in UTC-based timestamps.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-16 Thread nik9000
I've always used -1-f -  file.sql. It is confusing that -1 doesn't warn you 
when it wont work though. 

Sent from my iPhone

On Jun 16, 2012, at 3:42 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:

 
 Hello pgdev,
 
 (Second attempt)
 
 I've conducted a statistical study about PostgreSQL use in OSS. One of the 
 result is that quite a few projects have errors in their SQL setup scripts 
 which lead to some statements to be ignored, typically somme ADD CONSTRAINTS 
 which do not change the database schema from a functional point of view, or 
 syntactic errors (typically a mysql syntax...) that
 result in missing tables, but which are not found if the application is not 
 fully tested.
 
 I think that there are two reasons why these errors are not caught by 
 application developers:
 
 (1) the default verbosity is set to notice, which is much to high. The 
 users just get used to seeing a lot of messages on loading an sql script, and 
 to ignore them, so that errors are just hidden in the flow of notices. I 
 think that a better default setting would be warnings, that is messages 
 that require some attention from the developer.
 
 (2) the default behavior of psql on errors is to keep going. Developers of 
 SQL script that are expected to work shoud be advised to:
 - encourage application devs to set ON_ERROR_STOP and/or use a global
   transaction in their script.
 - provide a simple/short option to do that from the command line
   basically that could be an enhanced -1, NOT restricted
   to -f but that would work on standard input as well.
 
   sh psql -1 -f setup.sql # -1 does work here
   sh psql -1  setup.sql # -1 does not apply to stdin stuff...
 
 
 So I would suggest the following todos:
 
 1 - change the default verbosity to warning.
 
 2 - change -1 to work on stdin as well instead of being ignored,
or provide another option that would do that.
 
 -- 
 Fabien Coelho - coe...@cri.ensmp.fr
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-16 Thread Amit kapila
 Torn disk sector?  Please, this is nonsense.  Disks cannot write half a 
 sector and then stop. 
What I was intended to say is corruption due to hardware or some other 
problem, not because when 
Postgres is updating pg_control file. For example 
http://cquirke.mvps.org/9x/baddata.htm. 


 Well, we invented pg_resetxlog with the thought that it might be useful
  for such situations, but I'm not sure offhand that we've ever seen a
  field report of corrupted pg_control files.
I have found few cases where people have tried to use pg_resetxlog due to 
hardware problems or missing pg_control file.
http://archives.postgresql.org/pgsql-performance/2004-06/msg00236.php
http://archives.postgresql.org/pgsql-general/2004-06/msg00173.php
http://archives.postgresql.org/pgsql-admin/2006-12/msg00205.php

 Case in point here is that it's not immediately obvious that we should trust
 the contents of WAL more than pg_control. 
   Agreed.

 At the moment I don't see that we have either (a) or (b), so I think
 it's pretty dubious to be making any changes of this sort.
   As the chances of usecase for this feature are very less, So I will stop 
working on this feature.



From: Tom Lane [t...@sss.pgh.pa.us]
Sent: Saturday, June 16, 2012 12:11 PM
To: Amit kapila
Cc: Cédric Villemain; pgsql-hackers@postgresql.org; 'Robert Haas'
Subject: Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Amit kapila amit.kap...@huawei.com writes:
 AFAIR pg_controldata fit on a disk sector so it can not be half written.

It can be corrupt due to some other reasons as well like torn disk sector.

Torn disk sector?  Please, this is nonsense.  Disks cannot write half
a sector and then stop.  A sufficiently badly designed drive might
attempt to start a write when it didn't have enough power left to finish
... but the result of that would be a corrupt sector with a non-matching
CRC, not one that read back okay but contained erroneous data.

  The suggested patch improves the logic to recover corrupt control file. So 
 that is the reason I felt it will be relevant to do this patch.

Well, we invented pg_resetxlog with the thought that it might be useful
for such situations, but I'm not sure offhand that we've ever seen a
field report of corrupted pg_control files.  For instance, a quick
search in the archives for incorrect checksum in control file turns up
only cases of pilot error, such as supposing that a 32-bit database
could be used with a 64-bit server or vice versa.  Actual hardware
failures on the pg_control file could be expected to result in something
like could not read from control file: I/O error, which I find no
evidence for at all in the archives.

Before adding new code to improve the situation, it would be good to
have (a) evidence that there's a problem worth solving, and (b) a theory
as to what likely-to-occur cases the new code is going to make better,
while not making things worse in other likely-to-occur cases.  Case in
point here is that it's not immediately obvious that we should trust
the contents of WAL more than pg_control --- the former gets a whole
lot more write traffic and hence has many more opportunities for
failure.

At the moment I don't see that we have either (a) or (b), so I think
it's pretty dubious to be making any changes of this sort.

regards, tom lane
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers