Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Merlin Moncure
On Fri, Dec 26, 2014 at 12:38 PM, Kevin Grittner  wrote:
> Tom Lane  wrote:
>
>> Just for starters, a 40XXX error report will fail to provide the
>> duplicated key's value.  This will be a functional regression,
>
> Not if, as is normally the case, the transaction is retried from
> the beginning on a serialization failure.  Either the code will
> check for a duplicate (as in the case of the OP on this thread) and
> they won't see the error, *or* the the transaction which created
> the duplicate key will have committed before the start of the retry
> and you will get the duplicate key error.

I'm not buying that; that argument assumes duplicate key errors are
always 'upsert' driven.  Although OP's code may have checked for
duplicates it's perfectly reasonable (and in many cases preferable) to
force the transaction to fail and report the error directly back to
the application.  The application will then switch on the error code
and decide what to do: retry for deadlock/serialization or abort for
data integrity error.  IOW, the error handling semantics are
fundamentally different and should not be mixed.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE}

2014-12-26 Thread Peter Geoghegan
On Fri, Dec 19, 2014 at 5:32 PM, Peter Geoghegan  wrote:
>> Most people would list the columns, but if there is a really bizarre
>> constraint, with non-default opclasses, or an exclusion constraint, it's
>> probably been given a name that you could use.
>
> What I find curious about the opclass thing is: when do you ever have
> an opclass that has a different idea of equality than the default
> opclass for the type? In other words, when is B-Tree strategy number 3
> not actually '=' in practice, for *any* B-Tree opclass? Certainly, it
> doesn't appear to be the case that it isn't so with any shipped
> opclasses - the shipped non-default B-Tree opclasses only serve to
> provide alternative notions of sort order, and never "equals".
>
> I think that with B-Tree (which is particularly relevant for the
> UPDATE variant), it ought to be defined to work with the type's
> default opclass "equals" operator, just like GROUP BY and DISTINCT.
> Non-default opclass unique indexes work just as well in practice,
> unless someone somewhere happens to create an oddball one that doesn't
> use '=' as its "equals" operator (while also having '=' as the default
> opclass "equals" operator). I am not aware that that leaves any
> actually shipped opclass out (and I include our external extension
> ecosystem here, although I might be wrong about that part).

So looking at the way the system deals with its dependence on default
operator classes, I have a hard time justifying all this extra
overhead for the common case. The optimizer will refuse to use an
index with a non-default opclass even when AFAICT there is no *real*
semantic dependence on anything other than the "equals" operator,
which seems to always match across a type's opclasses anyway. e.g.,
DISTINCT will only use a non-default opclass B-Tree index, even though
in practice the "equals" operator always matches for shipped
non-default opclasses; DISTINCT will not work with a text_pattern_ops
index, while it will work with a default text B-Tree opclass index,
*even though no corresponding "ORDER BY" was given*.

Someone recently pointed out in a dedicated thread that the system
isn't all that bright about exploiting the fact that group aggregates
don't necessarily need to care about facets of sort ordering like
collations, which have additional overhead [1]. That might be a useful
special case to target (to make underlying sorts faster), but the big
picture is that the system doesn't know when it only needs to care
about an "equals" operator matching some particular
B-Tree-opclass-defined notion of sorting, rather than caring about a
variety of operators matching. Sometimes, having a matching "equals"
operator of some non-default opclass is good enough to make an index
(or sort scheme) of that opclass usable for some purpose that only
involves equality, and not sort order (like DISTINCT, with no ORDER
BY, executed using a GroupAggregate, for example).

I thought we should formalize the idea that a non-default opclass must
have the same notion of equality (the same "equals" operator) as its
corresponding default opclass, if any. That way, presumably the
optimizer has license to be clever about only caring about
"DISTINCTness"/equality. That also gives my implementation license to
not care about which operator class a unique index uses -- it must not
matter.

Heikki pointed out that there is one shipped opclass that has an
"equals" operator that happens to not be spelt "=" [2] (and
furthermore, does not match that of the default opclass). That's the
record_image_ops opclass, which unusually has an "equals" operator of
"*=". So as Heikki pointed out, it looks like there is some limited
precedent for having to worry about B-Tree opclasses that introduce
alternative notions of "equals", rather than merely alternative
notions of sort order. So so much for formalizing that all of a type's
B-Tree opclass "equals" operators must match...

...having thought about it for a while more, though, I think we should
*still* ignore opclass for the purposes of unique index inference. The
implementation doesn't care about the fact that you used a non-default
opclass. Sure, in theory that could lead to inconsistencies, if there
was multiple unique indexes of multiple opclasses that just so
happened to have incompatible ideas about equality, but that seems
ludicrous...we have only one extremely narrow example of how that
could happen. Plus there'd have to be *both* unique indexes defined
and available for us to infer as appropriate, before the inference
logic could accidentally infer the wrong idea of equality. That seems
like an extremely implausible scenario. Even if we allow for the idea
that alternative notions of equality are something that will happen in
the wild, obviously the user cares about the definition of equality
that they actually used for the unique index in question.

We can document that unique index inference doesn't care about
opclasses (recall that I still only plan on l

Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote:

> The argument that autovac workers need fresher stats than anything else
> seems pretty dubious to start with.  Why shouldn't we simplify that down
> to "they use PGSTAT_STAT_INTERVAL like everybody else"?

The point of wanting fresher stats than that, eons ago, was to avoid a
worker vacuuming a table that some other worker vacuumed more recently
than PGSTAT_STAT_INTERVAL.  I realize now that the semantics we really
want was something like "stats no older than XYZ" where the given value
is the timestamp at which we start checking; if we get anything newer
than that it would be okay, but we currently reject it because of lack
of a more appropriate API.  (If it takes more than PGSTAT_STAT_INTERVAL
to get the stats back, a regular backend would ask for fresher stats,
but to an autovac worker they would be good enough as long as they are
newer than its recheck start time.)

Nowadays we can probably disregard the whole issue, since starting a new
vacuum just after the prior one finished should not cause much stress to
the system thanks to the visibility map.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Better way of dealing with pgstat wait timeout during buildfarm runs?

2014-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Yeah, I've been getting more annoyed by that too lately.  I keep wondering
>> though whether there's an actual bug underneath that behavior that we're
>> failing to see.

> I think the first thing to do is reconsider usage of PGSTAT_RETRY_DELAY
> instead of PGSTAT_STAT_INTERVAL in autovacuum workers.  That decreases
> the wait time 50-fold, if I recall this correctly, and causes large
> amounts of extra I/O traffic.

Yeah --- that means that instead of the normal behavior that a stats file
newer than 500 msec is good enough, an autovac worker insists on a stats
file newer than 10 msec.  I did some experimentation on prairiedog, and
found that it's not hard at all to see autovac workers demanding multiple
stats writes per second:

2014-12-26 16:26:52.958 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:53.128 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:53.188 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:54.903 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:26:55.058 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.022 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.285 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:00.792 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.010 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.163 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:01.193 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.595 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.673 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.839 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:03.878 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:05.878 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:06.571 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.001 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.769 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:07.950 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:10.256 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:11.039 EST 21026 LOG:  sending inquiry for database 45116
2014-12-26 16:27:11.402 EST 21026 LOG:  sending inquiry for database 45116

The argument that autovac workers need fresher stats than anything else
seems pretty dubious to start with.  Why shouldn't we simplify that down
to "they use PGSTAT_STAT_INTERVAL like everybody else"?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Peter Geoghegan
On Fri, Dec 26, 2014 at 7:23 AM, Kevin Grittner  wrote:
> Are there any objections to generating a write conflict instead of
> a duplicate key error if the duplicate key was added by a
> concurrent transaction?  Only for transactions at isolation level
> REPEATABLE READ or higher?

This independently occurred to me as perhaps preferable over a year
ago. The INSERT...ON CONFLICT IGNORE feature that my patch adds will
throw such an error for REPEATABLE READ and higher modes rather than
proceeding on the basis of having ignored something from a concurrent
transaction.

+1 from me for doing this in the master branch, if it isn't too
invasive (I think it might be; where is estate available from to work
with close to duplicate checking for B-Trees?). It only makes sense to
do what you propose if the users expects to check that there is no
duplicate ahead of time, and only ever fail with a serialization
failure (not a duplicate violation) from concurrent conflicts. That is
a bit narrow; the OP can only reasonably expect to not see a duplicate
violation on retry because he happens to be checking...most clients
won't, and will "waste" a retry.

The proposed ON CONFLICT IGNORE feature would do this checking for him
correctly, FWIW, but in typical cases there is no real point in
retrying the xact -- you'll just get another duplicate violation
error.

In any case I think exclusion violations should be covered, too.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Nikita Volkov
I'll repost my (OP) case, for the references to it to make more sense to
the others.

Having the following table:

CREATE TABLE "song_artist" (
  "song_id" INT8 NOT NULL,
  "artist_id" INT8 NOT NULL,
  PRIMARY KEY ("song_id", "artist_id")
);

Even trying to protect from this with a select, won't help to get away from
the error, because at the beginning of the transaction the key does not
exist yet.

BEGIN ISOLATION LEVEL SERIALIZABLE READ WRITE;
  INSERT INTO song_artist (song_id, artist_id)
SELECT 1, 2
  WHERE NOT EXISTS (SELECT * FROM song_artist WHERE song_id=1 AND
artist_id=2);
COMMIT;

2014-12-26 21:38 GMT+03:00 Kevin Grittner :

> Tom Lane  wrote:
>
> > Just for starters, a 40XXX error report will fail to provide the
> > duplicated key's value.  This will be a functional regression,
>
> Not if, as is normally the case, the transaction is retried from
> the beginning on a serialization failure.  Either the code will
> check for a duplicate (as in the case of the OP on this thread) and
> they won't see the error, *or* the the transaction which created
> the duplicate key will have committed before the start of the retry
> and you will get the duplicate key error.
>
> > I think an appropriate response to these complaints is to fix the
> > documentation to point out that duplicate-key violations may also
> > be worthy of retries.
>
> > but I see no mention of the issue in chapter 13.)
>
> I agree that's the best we can do for stable branches, and worth
> doing.
>
> It would be interesting to hear from others who have rely on
> serializable transactions in production environments about what
> makes sense to them.  This is probably the wrong list to find such
> people directly; but I seem to recall Josh Berkus has a lot of
> clients who do.  Josh?  Any opinion on this thread?
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Abhijit Menon-Sen
At 2014-12-26 13:48:40 -0500, br...@momjian.us wrote:
>
> I assume you are only linking into Heikki's new code and will not
> change the places that use the old CRC method on disk --- just
> checking.

Correct. The legacy CRC computation code used by ltree etc. is
completely unaffected by both my sets of changes.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Bruce Momjian
On Fri, Dec 26, 2014 at 11:52:41PM +0530, Abhijit Menon-Sen wrote:
> At 2014-12-26 13:11:43 -0500, br...@momjian.us wrote:
> >
> > Is this something that could potentially change the data stored on
> > disk? Does pg_upgrade need to check for changes in this area?  Is the
> > detection exposed by pg_controldata?  Could this affect running the
> > data directory on a different CPU?
> 
> No to all.
> 
> Subsequent to Heikki's change (already in master) to use the CRC-32C
> algorithm (instead of the earlier mistakenly-reflected-but-not-quite
> one), both the slice-by-8 software implementation posted earlier and
> the SSE4.2 CRC32* instructions will compute exactly the same value.
> 
> (Yes, I have verified this in both cases.)

Uh, we didn't fully change all code to use the new algorithm because
there were cases that the CRC was already stored on disk, e.g hstore,
ltree.  I assume you are only linking into Heikki's new code and will
not change the places that use the old CRC method on disk --- just
checking.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-12-26 Thread Alexey Vasiliev
 Thanks for suggestions.

Patch updated.


Mon, 22 Dec 2014 12:07:06 +0900 от Michael Paquier :
>On Tue, Nov 4, 2014 at 6:25 AM, Alexey Vasiliev < leopard...@inbox.ru > wrote:
>> Added new patch.
>Seems useful to me to be able to tune this interval of time.
>
>I would simply reword the documentation as follows:
>If restore_command returns nonzero exit status code, retry
>command after the interval of time specified by this parameter.
>Default value is 5s.
>
>Also, I think that it would be a good idea to error out if this
>parameter has a value of let's say, less than 1s instead of doing a
>check for a positive value in the waiting latch. On top of that, the
>default value of restore_command_retry_interval should be 5000 and not
>0 to simplify the code.
>-- 
>Michael
>
>
>-- 
>Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Alexey Vasiliev
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..53ccf13 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,29 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  restore_command_retry_interval (integer)
+  
+restore_command_retry_interval recovery parameter
+  
+  
+  
+   
+If restore_command returns nonzero exit status code, retry
+command after the interval of time specified by this parameter.
+Default value is 5s.
+   
+   
+This is useful, if I using for restore of wal logs some
+external storage (like AWS S3) and no matter what the slave database
+will lag behind the master. The problem, what for each request to
+AWS S3 need to pay, what is why for N nodes, which try to get next
+wal log each 5 seconds will be bigger price, than for example each
+30 seconds.
+   
+  
+ 
+
 
 
   
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..7ed6f3b 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..02c55a8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 5000L;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void)
 	(errmsg_internal("trigger_file = '%s'",
 	 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_command_retry_interval") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
+		   &hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a temporal value",
+"restore_command_retry_interval"),
+		 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal("restore_command_retry_interval = '%s'", item->value)));
+
+			if (restore_command_retry_interval <= 0)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("\"%s\" must be bigger zero",
+	"restore_command_retry_interval")));
+			}
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -10658,13 +10681,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	}
 
 	/*
-	 * Wait for more WAL to arrive. Time out after 5 seconds,
+	 * Wait for more WAL to arrive. Time out after
+	 * restore_command_retry_interval (5 seconds by default),
 	 * like when polling the archive, to react to a trigger
 	 * file promptly.
 	 */
 	WaitLatch(&XLogCtl->recoveryWakeupLatch,
 			  WL_LATCH_SET | WL_TIMEOUT,
-			  5000L);
+			  restore_command_retry_interval);
 	ResetLatch(&XLogCtl->recoveryWakeupLatch);
 	break;
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Kevin Grittner
Tom Lane  wrote:

> Just for starters, a 40XXX error report will fail to provide the
> duplicated key's value.  This will be a functional regression,

Not if, as is normally the case, the transaction is retried from
the beginning on a serialization failure.  Either the code will
check for a duplicate (as in the case of the OP on this thread) and
they won't see the error, *or* the the transaction which created
the duplicate key will have committed before the start of the retry
and you will get the duplicate key error.

> I think an appropriate response to these complaints is to fix the
> documentation to point out that duplicate-key violations may also
> be worthy of retries.

> but I see no mention of the issue in chapter 13.)

I agree that's the best we can do for stable branches, and worth
doing.

It would be interesting to hear from others who have rely on 
serializable transactions in production environments about what 
makes sense to them.  This is probably the wrong list to find such 
people directly; but I seem to recall Josh Berkus has a lot of 
clients who do.  Josh?  Any opinion on this thread?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Abhijit Menon-Sen
At 2014-12-26 13:11:43 -0500, br...@momjian.us wrote:
>
> Is this something that could potentially change the data stored on
> disk? Does pg_upgrade need to check for changes in this area?  Is the
> detection exposed by pg_controldata?  Could this affect running the
> data directory on a different CPU?

No to all.

Subsequent to Heikki's change (already in master) to use the CRC-32C
algorithm (instead of the earlier mistakenly-reflected-but-not-quite
one), both the slice-by-8 software implementation posted earlier and
the SSE4.2 CRC32* instructions will compute exactly the same value.

(Yes, I have verified this in both cases.)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Bruce Momjian
On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
> >Uh, what if the system is compiled on a different CPU that it
> >is
> >run on?  Seems we would need a run-time CPU test.
> 
> That's the cpuid thing mentioned above.

Is this something that could potentially change the data stored on disk?
Does pg_upgrade need to check for changes in this area?  Is the
detection exposed by pg_controldata?  Could this affect running the data
directory on a different CPU?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> TBH, I think we could have done without this test altogether; but if we're
>> going to have it, a minimum expectation is that it not be hazardous to
>> other tests around it.

> The number of assertion failures in get_object_address without all the
> sanity checks I added in pg_get_object_address was a bit surprising.
> That's the whole reason I decided to add the test.  I don't want to
> blindly assume that all cases will work nicely in the future,
> particularly as other object types are added.

I'm surprised then that you didn't prefer the other solution (wrap the
whole test in a single transaction).  But we've probably spent more
time on this than it deserves.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Alvaro Herrera
Andres Freund wrote:

> >This sounds like a huge project -- it's not like event triggers are the
> >only objects in the system where this is an issue, is it?  I'm sure
> >there is value in fixing it, but I have enough other projects.
> 
> Can't we just move the test to run without parallelism? Its quite
> quick, so I don't it'd have noticeable consequences timewise.

(I got this a minute too late anyhow.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote:
> Andres Freund  writes:
> > Can't we just move the test to run without parallelism? Its quite quick, so 
> > I don't it'd have noticeable consequences timewise.
> 
> That just leaves the door open for somebody to add more tests parallel to
> it in future.

I've been long wanted to add declarative dependencies to tests: each
test file would declare what other tests it depends on, and we would
have a special clause to state "this one must not be run in concurrence
with anything else".  Of course, this is just wishful thinking at this
point.

> TBH, I think we could have done without this test altogether; but if we're
> going to have it, a minimum expectation is that it not be hazardous to
> other tests around it.

The number of assertion failures in get_object_address without all the
sanity checks I added in pg_get_object_address was a bit surprising.
That's the whole reason I decided to add the test.  I don't want to
blindly assume that all cases will work nicely in the future,
particularly as other object types are added.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Tom Lane
Andres Freund  writes:
> Can't we just move the test to run without parallelism? Its quite quick, so I 
> don't it'd have noticeable consequences timewise.

That just leaves the door open for somebody to add more tests parallel to
it in future.

TBH, I think we could have done without this test altogether; but if we're
going to have it, a minimum expectation is that it not be hazardous to
other tests around it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Andres Freund
On December 26, 2014 6:10:51 PM CET, Alvaro Herrera  
wrote:
>Tom Lane wrote:
>> Alvaro Herrera  writes:
>> > Tom Lane wrote:
>> >> Alvaro Herrera  writes:
>> >>> Hm, maybe we can drop the event trigger explicitely first, then
>wait a
>> >>> little bit, then drop the remaining objects with DROP CASCADE?
>> 
>> >> As I said, that's no fix; it just makes the timing harder to hit. 
>Another
>> >> process could be paused at the critical point for longer than
>whatever "a
>> >> little bit" is.
>> 
>> > Yeah, I was thinking we could play some games with the currently
>running
>> > XIDs from a txid_snapshot or some such, with a reasonable upper
>limit on
>> > the waiting time (for the rare cases with a server doing other
>stuff
>> > with long-running transactions.)
>> 
>> Whether that's sane or not, the whole problem is so far out-of-scope
>for
>> a test of pg_get_object_address() that it's not even funny.  I think
>> we should adopt one of the two fixes I recommended and call it good.
>
>I think dropping the part involving an event trigger from the test is
>reasonable.  I will go do that.
>
>> If you want to work on making DROP EVENT TRIGGER safer in the long
>run,
>> that can be a separate activity.
>
>This sounds like a huge project -- it's not like event triggers are the
>only objects in the system where this is an issue, is it?  I'm sure
>there is value in fixing it, but I have enough other projects.

Can't we just move the test to run without parallelism? Its quite quick, so I 
don't it'd have noticeable consequences timewise.


-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Tom Lane
Kevin Grittner  writes:
> Tom Lane  wrote:
>> Yes.  This will deliver a less meaningful error code,

> That depends entirely on whether you care more about whether the
> problem was created by a concurrent transaction or exactly how that
> concurrent transaction created the problem.

Just for starters, a 40XXX error report will fail to provide the
duplicated key's value.  This will be a functional regression,
on top of breaking existing code.

I think an appropriate response to these complaints is to fix the
documentation to point out that duplicate-key violations may also
be worthy of retries.  (I sort of thought it did already, actually,
but I see no mention of the issue in chapter 13.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Kevin Grittner
Tom Lane  wrote:
> Kevin Grittner  writes:

>> Are there any objections to generating a write conflict instead
>> of a duplicate key error if the duplicate key was added by a
>> concurrent transaction?
>
> Yes.  This will deliver a less meaningful error code,

That depends entirely on whether you care more about whether the
problem was created by a concurrent transaction or exactly how that
concurrent transaction created the problem.  For those using
serializable transactions to manage concurrency the former is at
least an order of magnitude more important.  This is not the first
time getting a constraint violation SQLSTATE for the actions of a
concurrent serializable transaction has been reported as a bug.
Going from memory, I think this might be about the fifth time users
have reported it as a bug or potential bug on these lists.

People using serializable transactions normally run all queries
through common code with will retry the transaction from the start
if there is a SQLSTATE starting with '40' (or perhaps picking out
the specific codes '40001' and '40P01').  Not doing so for *some*
types of errors generated by concurrent transactions reduces the
application's level of user-friendliness, and may introduce
surprising bugs.  In this particular case the OP wants to do one
thing if a row with a paricular value for a unique index exists,
and something different if it doesn't.  If we generate the write
conflict for the case that it is concurrently added, it can retry
the transaction and do one or the other; if we don't pay attention
to that, they need weird heuristics for "the third case".  That
really is not more meaningful or useful.

> *and* break existing code that is expecting the current behavior.

Possibly, but my experience is more that failure to behave the way
I suggest is biting people and causing them a lot of extra work and
pain.  I would be fine with limiting the new behavior to
serializable transactions, since that seems to be where people want
this behavior.  It would bring us closer to "the transaction will
run as though it were the only transaction running or roll back
with a serialization failure" without having to add caveats and
exceptions.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Andres Freund
On December 26, 2014 6:05:34 PM CET, Bruce Momjian  wrote:
>On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
>> On December 26, 2014 4:50:33 PM CET, Bruce Momjian 
>wrote:
>> >On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
>> >> Hi.
>> >> 
>> >> Here's a proposed patch to use CPUID at startup to determine if
>the
>> >> SSE4.2 CRC instructions are available, to use them instead of the
>> >> slice-by-8 implementation (posted earlier).
>> >> 
>> >> A few notes:
>> >> 
>> >> 1. GCC has included cpuid.h since 4.3.0, so I figured it was safe
>to
>> >>use. It can be replaced with some inline assembly otherwise.
>> >> 
>> >> 2. I've also used the crc32b/crc32q instructions directly rather
>than
>> >>using ".bytes" to encode the instructions; bintuils versions
>since
>> >>2007 or so have supported them.
>> >> 
>> >> 3. I've included the MSVC implementation mostly as an example of
>how
>> >to
>> >>extend this to different compilers/platforms. It's written
>> >according
>> >>to the documentation for MSVC intrinsics, but I have not tested
>> >it.
>> >>Suggestions/improvements are welcome.
>> >
>> >Uh, what happens if the system is compiled on a different CPU that
>it
>> >is
>> >run on?  Seems we would need a run-time CPU test.
>> 
>> That's the cpuid thing mentioned above.
>
>Oh, so cpuid is not a macro exported by the compiler but a C API. 
>Good

Neither, really. It's a instruction returning CPU capabilities. 

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Alvaro Herrera  writes:
> >>> Hm, maybe we can drop the event trigger explicitely first, then wait a
> >>> little bit, then drop the remaining objects with DROP CASCADE?
> 
> >> As I said, that's no fix; it just makes the timing harder to hit.  Another
> >> process could be paused at the critical point for longer than whatever "a
> >> little bit" is.
> 
> > Yeah, I was thinking we could play some games with the currently running
> > XIDs from a txid_snapshot or some such, with a reasonable upper limit on
> > the waiting time (for the rare cases with a server doing other stuff
> > with long-running transactions.)
> 
> Whether that's sane or not, the whole problem is so far out-of-scope for
> a test of pg_get_object_address() that it's not even funny.  I think
> we should adopt one of the two fixes I recommended and call it good.

I think dropping the part involving an event trigger from the test is
reasonable.  I will go do that.

> If you want to work on making DROP EVENT TRIGGER safer in the long run,
> that can be a separate activity.

This sounds like a huge project -- it's not like event triggers are the
only objects in the system where this is an issue, is it?  I'm sure
there is value in fixing it, but I have enough other projects.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Bruce Momjian
On Fri, Dec 26, 2014 at 04:52:58PM +0100, Andres Freund wrote:
> On December 26, 2014 4:50:33 PM CET, Bruce Momjian  wrote:
> >On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
> >> Hi.
> >> 
> >> Here's a proposed patch to use CPUID at startup to determine if the
> >> SSE4.2 CRC instructions are available, to use them instead of the
> >> slice-by-8 implementation (posted earlier).
> >> 
> >> A few notes:
> >> 
> >> 1. GCC has included cpuid.h since 4.3.0, so I figured it was safe to
> >>use. It can be replaced with some inline assembly otherwise.
> >> 
> >> 2. I've also used the crc32b/crc32q instructions directly rather than
> >>using ".bytes" to encode the instructions; bintuils versions since
> >>2007 or so have supported them.
> >> 
> >> 3. I've included the MSVC implementation mostly as an example of how
> >to
> >>extend this to different compilers/platforms. It's written
> >according
> >>to the documentation for MSVC intrinsics, but I have not tested
> >it.
> >>Suggestions/improvements are welcome.
> >
> >Uh, what happens if the system is compiled on a different CPU that it
> >is
> >run on?  Seems we would need a run-time CPU test.
> 
> That's the cpuid thing mentioned above.

Oh, so cpuid is not a macro exported by the compiler but a C API.  Good.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> Hm, maybe we can drop the event trigger explicitely first, then wait a
>>> little bit, then drop the remaining objects with DROP CASCADE?

>> As I said, that's no fix; it just makes the timing harder to hit.  Another
>> process could be paused at the critical point for longer than whatever "a
>> little bit" is.

> Yeah, I was thinking we could play some games with the currently running
> XIDs from a txid_snapshot or some such, with a reasonable upper limit on
> the waiting time (for the rare cases with a server doing other stuff
> with long-running transactions.)

Whether that's sane or not, the whole problem is so far out-of-scope for
a test of pg_get_object_address() that it's not even funny.  I think
we should adopt one of the two fixes I recommended and call it good.
If you want to work on making DROP EVENT TRIGGER safer in the long run,
that can be a separate activity.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> I've not proven this rigorously, but it seems obvious in hindsight:
> >> what's happening is that when the object_address test drops everything
> >> with DROP CASCADE, other processes are sometimes just starting to execute
> >> the event trigger when the DROP commits.  When they go to look up the
> >> trigger function, they don't find it, leading to "cache lookup failed for
> >> function".
> 
> > Hm, maybe we can drop the event trigger explicitely first, then wait a
> > little bit, then drop the remaining objects with DROP CASCADE?
> 
> As I said, that's no fix; it just makes the timing harder to hit.  Another
> process could be paused at the critical point for longer than whatever "a
> little bit" is.

Yeah, I was thinking we could play some games with the currently running
XIDs from a txid_snapshot or some such, with a reasonable upper limit on
the waiting time (for the rare cases with a server doing other stuff
with long-running transactions.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> I've not proven this rigorously, but it seems obvious in hindsight:
>> what's happening is that when the object_address test drops everything
>> with DROP CASCADE, other processes are sometimes just starting to execute
>> the event trigger when the DROP commits.  When they go to look up the
>> trigger function, they don't find it, leading to "cache lookup failed for
>> function".

> Hm, maybe we can drop the event trigger explicitely first, then wait a
> little bit, then drop the remaining objects with DROP CASCADE?

As I said, that's no fix; it just makes the timing harder to hit.  Another
process could be paused at the critical point for longer than whatever "a
little bit" is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Still, I don't think this is a reasonable test design.  We have
> >> absolutely no idea what behaviors are being triggered in the other
> >> tests, except that they are unrelated to what those tests think they
> >> are testing.
> 
> > I can of course move it to a separate parallel test, but I don't think
> > that should be really necessary.
> 
> I've not proven this rigorously, but it seems obvious in hindsight:
> what's happening is that when the object_address test drops everything
> with DROP CASCADE, other processes are sometimes just starting to execute
> the event trigger when the DROP commits.  When they go to look up the
> trigger function, they don't find it, leading to "cache lookup failed for
> function".

Hm, maybe we can drop the event trigger explicitely first, then wait a
little bit, then drop the remaining objects with DROP CASCADE?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Andres Freund
On December 26, 2014 4:50:33 PM CET, Bruce Momjian  wrote:
>On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
>> Hi.
>> 
>> Here's a proposed patch to use CPUID at startup to determine if the
>> SSE4.2 CRC instructions are available, to use them instead of the
>> slice-by-8 implementation (posted earlier).
>> 
>> A few notes:
>> 
>> 1. GCC has included cpuid.h since 4.3.0, so I figured it was safe to
>>use. It can be replaced with some inline assembly otherwise.
>> 
>> 2. I've also used the crc32b/crc32q instructions directly rather than
>>using ".bytes" to encode the instructions; bintuils versions since
>>2007 or so have supported them.
>> 
>> 3. I've included the MSVC implementation mostly as an example of how
>to
>>extend this to different compilers/platforms. It's written
>according
>>to the documentation for MSVC intrinsics, but I have not tested
>it.
>>Suggestions/improvements are welcome.
>
>Uh, what happens if the system is compiled on a different CPU that it
>is
>run on?  Seems we would need a run-time CPU test.

That's the cpuid thing mentioned above.

Andres

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Tom Lane
Kevin Grittner  writes:
> Are there any objections to generating a write conflict instead of
> a duplicate key error if the duplicate key was added by a
> concurrent transaction?

Yes.  This will deliver a less meaningful error code, *and* break
existing code that is expecting the current behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-12-26 Thread Bruce Momjian
On Thu, Dec 25, 2014 at 11:57:29AM +0530, Abhijit Menon-Sen wrote:
> Hi.
> 
> Here's a proposed patch to use CPUID at startup to determine if the
> SSE4.2 CRC instructions are available, to use them instead of the
> slice-by-8 implementation (posted earlier).
> 
> A few notes:
> 
> 1. GCC has included cpuid.h since 4.3.0, so I figured it was safe to
>use. It can be replaced with some inline assembly otherwise.
> 
> 2. I've also used the crc32b/crc32q instructions directly rather than
>using ".bytes" to encode the instructions; bintuils versions since
>2007 or so have supported them.
> 
> 3. I've included the MSVC implementation mostly as an example of how to
>extend this to different compilers/platforms. It's written according
>to the documentation for MSVC intrinsics, but I have not tested it.
>Suggestions/improvements are welcome.

Uh, what happens if the system is compiled on a different CPU that it is
run on?  Seems we would need a run-time CPU test.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Marko Tiikkaja

On 2014-12-26 17:23, Kevin Grittner wrote:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction?  Only for transactions at isolation level
REPEATABLE READ or higher?


Is it possible to distinguish between an actual write conflict and a 
completely unrelated UNIQUE constraint ending up violated for some reason?



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Some other odd buildfarm failures

2014-12-26 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Still, I don't think this is a reasonable test design.  We have
>> absolutely no idea what behaviors are being triggered in the other
>> tests, except that they are unrelated to what those tests think they
>> are testing.

> I can of course move it to a separate parallel test, but I don't think
> that should be really necessary.

I've not proven this rigorously, but it seems obvious in hindsight:
what's happening is that when the object_address test drops everything
with DROP CASCADE, other processes are sometimes just starting to execute
the event trigger when the DROP commits.  When they go to look up the
trigger function, they don't find it, leading to "cache lookup failed for
function".  The fact that the complained-of OID is slightly variable, but
always in the range of OIDs that would be getting assigned around this
point in a "make check" run, buttresses the theory.

I thought about changing the object_address test so that it explicitly
drops the event trigger first.  But that would not be a fix, it would
just make the timing harder to hit (ie, a victim process would need to
lose control for longer at the critical point).

Since I remain of the opinion that a test called object_address has no
damn business causing global side-effects, I think there are two
reasonable fixes:

1. Remove the event trigger.  This would slightly reduce the test's
coverage.

2. Run that whole test as a single transaction, so that the event trigger
is created and dropped in one transaction and is never seen as valid by
any concurrent test.

A long-term idea is to try to fix things so that there's sufficient
locking to make dropping an event trigger and immediately dropping its
trigger function safe.  But I'm not sure that's either possible or a good
idea (the lock obtained by DROP would bring the entire database to a
standstill ...).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BUG #12330: ACID is broken for unique constraints

2014-12-26 Thread Kevin Grittner
"nikita.y.vol...@mail.ru"  wrote:

> Executing concurrent transactions inserting the same value of a
> unique key fails with the "duplicate key" error under code
> "23505" instead of any of transaction conflict errors with a
> "40***" code.

This is true, and can certainly be inconvenient when using
serializable transactions to simplify handling of race conditions,
because you can't just test for a SQLSTATE of '40001' or '40P01' to
indicate the need to retry the transaction.  You have two
reasonable ways to avoid duplicate keys if the values are synthetic
and automatically generated.  One is to use a SEQUENCE object to
generate the values.  The other (really only recommended if gaps in
the sequence are a problem) is to have the serializable transaction
update a row to "claim" the number.

Otherwise you need to consider errors related to duplicates as
possibly being caused by a concurrent transaction.  You may want to
do one transaction retry in such cases, and fail if an identical
error is seen.  Keep in mind that these errors do not allow
serialization anomalies to appear in the committed data, so are
arguably not violations of ACID principles -- more of a wart on the
otherwise clean technique of using serializable transactions to
simplify application programming under concurrency.

Thinking about it just now I think we might be able to generate a
write conflict instead of a duplicate key error for this case by
checking the visibility information for the duplicate row.  It
might not even have a significant performance impact, since we need
to check visibility information to generate the duplicate key
error.  That would still leave similar issues (where similar
arguments can be made) relating to foreign keys; but those can
largely be addressed already by declaring the constraints to be
DEFERRED -- and anyway, that would be a separate fix.

I'm moving this discussion to the -hackers list so that I can ask
other developers:

Are there any objections to generating a write conflict instead of
a duplicate key error if the duplicate key was added by a
concurrent transaction?  Only for transactions at isolation level
REPEATABLE READ or higher?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] explain sortorder

2014-12-26 Thread Arne Scheffer
>Heikki Linnakangas  writes:
>> I would suggest just adding the information to the Sort Key line. As
>> long as you don't print the modifiers when they are defaults (ASC and
>> NULLS LAST), we could print the information even in non-VERBOSE mode.

>+1.  I had assumed without looking that that was what it did already,
>else I'd have complained too.

>   regards, tom lane

We will change the patch according to Heikkis suggestions.

A nice Christmas & all the best in the New Year

Arne Scheffer

http://www.uni-muenster.de/ZIV/Mitarbeiter/ArneScheffer.shtml


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2014-12-26 Thread David Rowley
On 24 December 2014 at 16:04, Andreas Karlsson  wrote:

On 12/16/2014 11:04 AM, David Rowley wrote:> These are some very promising
> performance increases.
>
>>
>> I've done a quick pass of reading the patch. I currently don't have a
>> system with a 128bit int type, but I'm working on that.
>>
>
> Sorry for taking some time to get back. I have been busy before Christmas.
> A new version of the patch is attached.
>
>
Ok I've had another look at this, and I think the only things that I have
to say have been mentioned already:

1. Do we need to keep the 128 byte aggregate state size for machines
without 128 bit ints? This has been reduced to 48 bytes in the patch, which
is in favour code being compiled with a compiler which has 128 bit ints.  I
kind of think that we need to keep the 128 byte estimates for compilers
that don't support int128, but I'd like to hear any counter arguments.

2. References to int16 meaning 16 bytes. I'm really in two minds about
this, it's quite nice to keep the natural flow, int4, int8, int16, but I
can't help think that this will confuse someone one day. I think it'll be a
long time before it confused anyone if we called it int128 instead, but I'm
not that excited about seeing it renamed either. I'd like to hear what
others have to say... Is there a chance that some sql standard in the
distant future will have HUGEINT and we might regret not getting the
internal names nailed down?

I also checked the performance of some windowing function calls, since
these call the final function for each row, I thought I'd better make sure
there was no regression as the final function must perform a conversion
from int128 to numeric for each row.

It seems there's still an increase in performance:


Setup:
create table bi_win (i bigint primary key);
insert into bi_win select x.x from generate_series(1,1) x(x);
vacuum analyze;

Query:
select sum(i) over () from bi_win;

** Master
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 6567
latency average: 9.137 ms
tps = 109.445841 (including connections establishing)
tps = 109.456941 (excluding connections establishing)

** Patched
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 7841
latency average: 7.652 ms
tps = 130.670253 (including connections establishing)
tps = 130.675743 (excluding connections establishing)

Setup:
create table i_win (i int primary key);
insert into i_win select x.x from generate_series(1,1) x(x);
vacuum analyze;

Query:
select stddev(i) over () from i_win;

** Master
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 5084
latency average: 11.802 ms
tps = 84.730362 (including connections establishing)
tps = 84.735693 (excluding connections establishing)

** Patched
./pgbench -f ~/int128_window.sql -n -T 60 postgres
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 7557
latency average: 7.940 ms
tps = 125.934787 (including connections establishing)
tps = 125.943176 (excluding connections establishing)

Regards

David Rowley


Re: [HACKERS] Additional role attributes && superuser review

2014-12-26 Thread Magnus Hagander
On Wed, Dec 24, 2014 at 6:48 PM, Adam Brightwell <
adam.brightw...@crunchydatasolutions.com> wrote:

> All,
>
> I want to revive this thread and continue to move these new role
> attributes forward.
>
> In summary, the ultimate goal is to include new role attributes for common
> operations which currently require superuser privileges.
>
> Initially proposed were the following attributes:
>
> * BACKUP - allows role to perform backup operations
> * LOGROTATE - allows role to rotate log files
> * MONITOR - allows role to view pg_stat_* details
> * PROCSIGNAL - allows role to signal backend processes
>
> It seems that PROCSIGNAL and MONITOR were generally well received and
> probably don't warrant much more discussion at this point.
>
> However, based on previous discussions, there seemed to be some
> uncertainty on how to handle BACKUP and LOGROTATE.
>
> Concerns:
>
> * LOGROTATE - only associated with one function/operation.
> * BACKUP - perceived to be too broad of a permission as it it would
> provide the ability to run pg_start/stop_backend and the xlog related
> functions.  It is general sentiment is that these should be handled as
> separate privileges.
>
* BACKUP - preferred usage is with pg_dump to giving a user the ability to
> run pg_dump on the whole database without being superuser.
>
> Previous Recommendations:
>
> * LOGROTATE - Use OPERATOR - concern was expressed that this might be too
> general of an attribute for this purpose.  Also, concern for privilege
> 'upgrades' as it includes more capabilities in later releases.
> * LOGROTATE - Use LOG_OPERATOR - generally accepted, but concern was raise
> for using extraneous descriptors such as '_OPERATOR' and '_ADMIN', etc.
> * BACKUP - Use WAL_CONTROL for pg_start/stop_backup - no major
> disagreement, though same concern regarding extraneous descriptors.
> * BACKUP - Use XLOG_OPERATOR for xlog operations - no major disagreement,
> though same concern regarding extraneous descriptors.
> * BACKUP - Use BACKUP for granting non-superuser ability to run pg_dump on
> whole database.
>
> Given the above and previous discussions:
>
> I'd like to propose the following new role attributes:
>
> BACKUP - allows role to perform pg_dump* backups of whole database.
>

I'd suggest it's called DUMP if that's what it allows, to keep it separate
from the backup parts.



> WAL - allows role to execute pg_start_backup/pg_stop_backup functions.
>
 XLOG - allows role to execute xlog operations.


That seems really bad names, IMHO. Why? Because we use WAL and XLOG
throughout documentation and parameters and code to mean *the same thing*.
And here they'd suddenly mean different things. If we need them as separate
privileges, I think we need much better names. (And a better description -
what is "xlog operations" really?)

And the one under WAL would be very similar to what REPLICATION does today.
Or are you saying that it should specifically *not* allow base backups done
through the replication protocol, only the exclusive ones?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2014-12-26 Thread Pavel Stehule
2014-12-26 11:41 GMT+01:00 Pavel Stehule :

>
>
> 2014-12-25 22:23 GMT+01:00 Alex Shulgin :
>
>> Trent Shipley  writes:
>>
>> > On Friday 2007-12-14 16:22, Tom Lane wrote:
>> >> Neil Conway  writes:
>> >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct
>> COPY
>> >> > to drop (and log) rows that contain malformed data. That is, rows
>> with
>> >> > too many or too few columns, rows that result in constraint
>> violations,
>> >> > and rows containing columns where the data type's input function
>> raises
>> >> > an error. The last case is the only thing that would be a bit tricky
>> to
>> >> > implement, I think: you could use PG_TRY() around the
>> InputFunctionCall,
>> >> > but I guess you'd need a subtransaction to ensure that you reset your
>> >> > state correctly after catching an error.
>> >>
>> >> Yeah.  It's the subtransaction per row that's daunting --- not only the
>> >> cycles spent for that, but the ensuing limitation to 4G rows imported
>> >> per COPY.
>> >
>> > You could extend the COPY FROM syntax with a COMMIT EVERY n clause.
>> This
>> > would help with the 4G subtransaction limit.  The cost to the ETL
>> process is
>> > that a simple rollback would not be guaranteed send the process back to
>> it's
>> > initial state.  There are easy ways to deal with the rollback issue
>> though.
>> >
>> > A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY
>> > option is selected then the COPY FROM can run without subtransactions
>> and in
>> > excess of the 4G per transaction limit.  NO RETRY should be the default
>> since
>> > it preserves the legacy behavior of COPY FROM.
>> >
>> > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
>> give the
>> > option of sending exceptions to a table since they are presumably
>> malformed,
>> > otherwise they would not be exceptions.  (Users should re-process
>> exception
>> > files if they want an if good then table a else exception to table b
>> ...)
>> >
>> > EXCEPTIONS TO and NO RETRY would be mutually exclusive.
>> >
>> >
>> >> If we could somehow only do a subtransaction per failure, things would
>> >> be much better, but I don't see how.
>>
>> Hello,
>>
>> Attached is a proof of concept patch for this TODO item.  There is no
>> docs yet, I just wanted to know if approach is sane.
>>
>> The added syntax is like the following:
>>
>>   COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
>>
>> The way it's done it is abusing Copy Both mode and from my limited
>> testing, that seems to just work.  The error trapping itself is done
>> using PG_TRY/PG_CATCH and can only catch formatting or before-insert
>> trigger errors, no attempt is made to recover from a failed unique
>> constraint, etc.
>>
>> Example in action:
>>
>> postgres=# \d test_copy2
>>   Table "public.test_copy2"
>>  Column |  Type   | Modifiers
>> +-+---
>>  id | integer |
>>  val| integer |
>>
>> postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
>> 1
>> NOTICE:  missing data for column "val"
>> CONTEXT:  COPY test_copy2, line 1: "1"
>> 2
>> NOTICE:  missing data for column "val"
>> CONTEXT:  COPY test_copy2, line 2: "2"
>> 3
>> NOTICE:  missing data for column "val"
>> CONTEXT:  COPY test_copy2, line 3: "3"
>> NOTICE:  total exceptions ignored: 3
>>
>> postgres=# \d test_copy1
>>   Table "public.test_copy1"
>>  Column |  Type   | Modifiers
>> +-+---
>>  id | integer | not null
>>
>> postgres=# set client_min_messages to warning;
>> SET
>> postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
>> ...
>> vmstat
>> zoneinfo
>> postgres=#
>>
>> Limited performance testing shows no significant difference between
>> error-catching and plain code path.  For example, timing
>>
>>   copy test_copy1 from program 'seq 100' [exceptions to stdout]
>>
>> shows similar numbers with or without the added "exceptions to" clause.
>>
>> Now that I'm sending this I wonder if the original comment about the
>> need for subtransaction around every loaded line still holds.  Any
>> example of what would be not properly rolled back by just PG_TRY?
>>
>
> this method is unsafe .. exception handlers doesn't free memory usually -
> there is risk of memory leaks, source leaks
>
> you can enforce same performance with block subtransactions - when you use
> subtransaction for 1000 rows, then impact of subtransactions is minimal
>
> when block fails, then you can use row level subtransaction - it works
> well when you expect almost correct data.
>

Two years ago I wrote a extension that did it - but I have not time to
finish it and push to upstream.

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>>
>> Happy hacking!
>> --
>> Alex
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>


ftcopy-04.tgz
Description: GNU Zip compresse

Re: [HACKERS] POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2014-12-26 Thread Pavel Stehule
2014-12-25 22:23 GMT+01:00 Alex Shulgin :

> Trent Shipley  writes:
>
> > On Friday 2007-12-14 16:22, Tom Lane wrote:
> >> Neil Conway  writes:
> >> > By modifying COPY: COPY IGNORE ERRORS or some such would instruct COPY
> >> > to drop (and log) rows that contain malformed data. That is, rows with
> >> > too many or too few columns, rows that result in constraint
> violations,
> >> > and rows containing columns where the data type's input function
> raises
> >> > an error. The last case is the only thing that would be a bit tricky
> to
> >> > implement, I think: you could use PG_TRY() around the
> InputFunctionCall,
> >> > but I guess you'd need a subtransaction to ensure that you reset your
> >> > state correctly after catching an error.
> >>
> >> Yeah.  It's the subtransaction per row that's daunting --- not only the
> >> cycles spent for that, but the ensuing limitation to 4G rows imported
> >> per COPY.
> >
> > You could extend the COPY FROM syntax with a COMMIT EVERY n clause.  This
> > would help with the 4G subtransaction limit.  The cost to the ETL
> process is
> > that a simple rollback would not be guaranteed send the process back to
> it's
> > initial state.  There are easy ways to deal with the rollback issue
> though.
> >
> > A {NO} RETRY {USING algorithm} clause might be useful.   If the NO RETRY
> > option is selected then the COPY FROM can run without subtransactions
> and in
> > excess of the 4G per transaction limit.  NO RETRY should be the default
> since
> > it preserves the legacy behavior of COPY FROM.
> >
> > You could have an EXCEPTIONS TO {filename|STDERR} clause. I would not
> give the
> > option of sending exceptions to a table since they are presumably
> malformed,
> > otherwise they would not be exceptions.  (Users should re-process
> exception
> > files if they want an if good then table a else exception to table b ...)
> >
> > EXCEPTIONS TO and NO RETRY would be mutually exclusive.
> >
> >
> >> If we could somehow only do a subtransaction per failure, things would
> >> be much better, but I don't see how.
>
> Hello,
>
> Attached is a proof of concept patch for this TODO item.  There is no
> docs yet, I just wanted to know if approach is sane.
>
> The added syntax is like the following:
>
>   COPY [table] FROM [file/program/stdin] EXCEPTIONS TO [file or stdout]
>
> The way it's done it is abusing Copy Both mode and from my limited
> testing, that seems to just work.  The error trapping itself is done
> using PG_TRY/PG_CATCH and can only catch formatting or before-insert
> trigger errors, no attempt is made to recover from a failed unique
> constraint, etc.
>
> Example in action:
>
> postgres=# \d test_copy2
>   Table "public.test_copy2"
>  Column |  Type   | Modifiers
> +-+---
>  id | integer |
>  val| integer |
>
> postgres=# copy test_copy2 from program 'seq 3' exceptions to stdout;
> 1
> NOTICE:  missing data for column "val"
> CONTEXT:  COPY test_copy2, line 1: "1"
> 2
> NOTICE:  missing data for column "val"
> CONTEXT:  COPY test_copy2, line 2: "2"
> 3
> NOTICE:  missing data for column "val"
> CONTEXT:  COPY test_copy2, line 3: "3"
> NOTICE:  total exceptions ignored: 3
>
> postgres=# \d test_copy1
>   Table "public.test_copy1"
>  Column |  Type   | Modifiers
> +-+---
>  id | integer | not null
>
> postgres=# set client_min_messages to warning;
> SET
> postgres=# copy test_copy1 from program 'ls /proc' exceptions to stdout;
> ...
> vmstat
> zoneinfo
> postgres=#
>
> Limited performance testing shows no significant difference between
> error-catching and plain code path.  For example, timing
>
>   copy test_copy1 from program 'seq 100' [exceptions to stdout]
>
> shows similar numbers with or without the added "exceptions to" clause.
>
> Now that I'm sending this I wonder if the original comment about the
> need for subtransaction around every loaded line still holds.  Any
> example of what would be not properly rolled back by just PG_TRY?
>

this method is unsafe .. exception handlers doesn't free memory usually -
there is risk of memory leaks, source leaks

you can enforce same performance with block subtransactions - when you use
subtransaction for 1000 rows, then impact of subtransactions is minimal

when block fails, then you can use row level subtransaction - it works well
when you expect almost correct data.

Regards

Pavel


>
> Happy hacking!
> --
> Alex
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] btree_gin and ranges

2014-12-26 Thread Teodor Sigaev

Teodor's patch could use some more comments. The STOP_SCAN/MATCH_SCAN/CONT_SCAN
macros are a good idea, but they probably should go into
src/include/access/gin.h so that they can be used in all compare_partial
implementations.


STOP_SCAN/MATCH_SCAN/CONT_SCAN macros are moved to gin's header, and comments 
are improved.


Split patch to two: gin and module


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


gin_macros.patch.gz
Description: GNU Zip compressed data


btree_gin_range-3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers