date:20140107

Re: [HACKERS] [bug fix] pg_ctl stop times out when it should respond quickly

2014-01-07 Thread Michael Paquier

On Sun, Jan 5, 2014 at 3:49 PM, MauMau maumau...@gmail.com wrote:
 Could you confirm again and tell me what problem is happening?
FWIW, I just quickly tested those two patches independently and got
them correctly applied with patch -p1  $PATCH on master at edc4345.
They compiled and passed as well make check.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Support for pg_stat_archiver view

2014-01-07 Thread Michael Paquier

On Mon, Jan 6, 2014 at 5:37 PM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:
 Hi Fabrizio,

 Il 05/01/14 20:46, Fabrizio Mello ha scritto:
 I don't see your code yet, but I would like to know if is possible to
 implement this view as an extension.
 I wanted to do it as an extension - so that I could backport that to
 previous versions of Postgres.

 I do not think it is a possibility, given that the client code that is
 aware of the events lies in pgarch.c.
I don't see particularly any point in doing that as an extension as
you want to log this statistical information once archive has either
failed or passed.

I just had a quick look at v2, and it looks that the patch is in good
shape. Sorry to be picky, but I am not sure that using the character
m_type is adapted when delivering messages to pgstat facility. You
might want to use a boolean instead... MAX_XFN_CHARS is not adapted as
well IMO: you should remove it and use instead MAXFNAMELEN of
xlog_internal.h, where all the file names related to WAL files,
including history and backup files, are generated.

Then, the patch looks to be working as expected, here are some results
with a short test:
=# \x
Expanded display (expanded) is on.
=# select * from pg_stat_get_archiver();
-[ RECORD 1 ]--+--
archived_wals  | 6
last_archived_wal  | 00010005
last_archived_wal_time | 2014-01-07 17:27:34.752903+09
failed_attempts| 12
last_failed_wal| 00010006
last_failed_wal_time   | 2014-01-07 17:31:18.409528+09
stats_reset
(1 row)
=# select * from pg_stat_archiver;
-[ RECORD 1 ]--+--
archived_wals  | 6
last_archived_wal  | 00010005
last_archived_wal_time | 2014-01-07 17:27:34.752903+09
failed_attempts| 12
last_failed_wal| 00010006
last_failed_wal_time   | 2014-01-07 17:31:18.409528+09
stats_reset| 2014-01-07 17:25:51.949498+09

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA

2014-01-07 Thread Michael Paquier

On Thu, Jan 2, 2014 at 11:34 PM, Bernd Helmle maili...@oopsware.de wrote:


 --On 1. Januar 2014 23:53:46 +0100 Dimitri Fontaine dimi...@2ndquadrant.fr
 wrote:

 Hi,

 As much as I've seen people frown upon $subject, it still happens in the
 wild, and Magnus seems to agree that the current failure mode of our
 pg_basebackup tool when confronted to the situation is a bug.

 So here's a fix, attached.


 I've seen having tablespaces under PGDATA as a policy within several data
 centres in the past. The main reasoning behind this seems that they strictly
 separate platform and database administration and for database inventory
 reasons. They are regularly surprised if you tell them to not use
 tablespaces in such a way, since they absorbed this practice over the years
 from other database systems. So +1 for fixing this.
Same here. +1 for fixing it. Having tablespaces in PGDATA itself is
most of the time part of some obscure policy making easier to manage
all the data in one folder...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR: missing chunk number 0 for toast value

2014-01-07 Thread Heikki Linnakangas


On 01/06/2014 08:29 PM, Andres Freund wrote:

On 2014-01-06 12:40:25 -0500, Robert Haas wrote:

On Mon, Jan 6, 2014 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote:

On 2014-01-06 11:08:41 -0500, Robert Haas wrote:
Yea. But at least it would fail reliably instead of just under
concurrency and other strange circumstances - and there'd be a safe way
out. Currently there seem to be all sorts of odd behaviour possible.

I simply don't have a better idea :(


Is forcibly detoast everything a complete no-go?  I realize there
are performance concerns with that approach, but I'm not sure how
realistic a worry it actually is.


The scenario I am primarily worried about is turning a record assignment
which previously took up to BLOCK_SIZE + slop amount of memory into
something taking up to a gigabyte. That's a pretty damn hefty
change.
And there's no good way of preventing it short of using a variable for
each actually desired column which imnsho isn't really a solution.


We could mitigate that somewhat by doing an optimization pass of the 
PL/pgSQL code after compilation, and check which fields of a row 
variable are never referenced, and skip the detoasting for those fields. 
It would only work for named row variables, not anonymous record 
variables, and you would still unnecessarily detoast fields that are 
sometimes accessed but usually not. But it would avoid the detoasting in 
the most egregious cases, e.g where you fetch a whole row into a 
variable just to access one field.


Overall, I'm leaning towards biting the bullet and always detoasting 
everything in master. Probably best to just leave the stable branches alone.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-07 Thread Heikki Linnakangas


On 01/06/2014 07:12 PM, Mark Dilger wrote:

The reason I was going to all the trouble of creating
chrooted environments was to be able to replicate
clusters that have tablespaces.


You can remove and recreate the symlink in pg_tblspc directory, after 
creating the cluster, to point it to a different location. It might be a 
bit tricky to do that if you have two clusters running at the same time, 
but it's probably easier than chrooting anyway. For example:


1. stop the standby
2. create the tablespace in master
3. stop master
4. mv the tablespace directory, and modify the symlink in master to 
point to the new location
5. start standby. It will replay the tablespace creation in the original 
location

6. restart master.

You now have the same tablespace in master and standby, but they point 
to different locations. This doesn't allow dynamically creating and 
dropping tablespaces during tests, but at least it gives you one 
tablespace to use.


Another idea would be to do something like chroot, but more lightweight, 
using FUSE, private mount namespaces, or cgroups.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Dan Ports

On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote:
 Also I cannot reproduce a scenario when applications must not depend on
 results read during a transaction that later aborted;. In this example the
 SELECT itself has failed.
 Can you show an example where a SELECT completes, but the COMMIT blows up?

Actually, no, not for a read-only transaction. It happens that the
final serialization failure check executed on COMMIT only affects
read/write transactions, not read-only ones. That's a pretty specific
implementation detail, though, so I wouldn't necessarily rely on it...

Here's an example of why applications must not depend on results read
during a transaction that later aborted:

   W2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
   W2: UPDATE t SET count=1 WHERE id=1;
   W1: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE
   W1: SELECT * FROM t WHERE id=1;
   W2: COMMIT;
   R : BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY
   R : SELECT * FROM t;
   R : COMMIT;
 ! W1: UPDATE t SET count=1 WHERE id=2;
   W1: COMMIT;

If you try this, it'll cause a serialization failure on the line marked
with a '!'. W1 saw (1,0) in the table, so W1 appears to have executed
before W2. But R saw both (1,1) and (2,0) in the table, and that has to
be a consistent snapshot of the database state, meaning W2 appears to
have executed before W1. That's an inconsistency, so something has to
be rolled back. This particular anomaly requires all three of the
transactions, and so it can't be detected until W1 does its UPDATE.
Postgres detects the conflict at that point and rolls back W1.

So what does this have to do with relying on the results of read-only
transactions that abort? Well, what if you had instead had R ROLLBACK
instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to
be equivalent for transactions that don't modify the database, or maybe
because something else caused the transaction to abort? When W1 does
its update, it will be checked for serialization failures, but aborted
transactions are (intentionally) not included in those checks. W1 is
therefore allowed to commit; the apparent serial order of execution is
W1 followed by W2, and the results of the aborted transaction R aren't
consistent with that.

Dan

-- 
Dan R. K. PortsUW CSEhttp://drkp.net/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] truncating pg_multixact/members

2014-01-07 Thread Andres Freund

On 2014-01-06 20:51:57 -0500, Robert Haas wrote:
 On Mon, Jan 6, 2014 at 7:50 PM, Jim Nasby j...@nasby.net wrote:
  On 1/4/14, 8:19 AM, Robert Haas wrote:
  Also, while multixactid_freeze_min_age should be low, perhaps a
  million as you suggest, multixactid_freeze_table_age should NOT be
  lowered to 3 million or anything like it.  If you do that, people who
  are actually doing lots of row locking will start getting many more
  full-table scans.  We want to avoid that at all cost.  I'd probably
  make the default the same as for vacuum_freeze_table_age, so that
  mxids only cause extra full-table scans if they're being used more
  quickly than xids.
 
  Same default as vacuum_freeze_table_age, or default TO
  vacuum_freeze_table_age? I'm thinking the latter makes more sense...
 
 Same default.  I think it's a mistake to keep leading people to think
 that the sensible values for one set of parameters are somehow related
 to a sensible set of values for the other set.  They're really quite
 different things.

Valid argument - on the other hand, defaulting to the current variable's
value has the advantage of being less likely to cause pain when doing a
minor version upgrade because suddenly full table vacuums are much more
frequent.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Triggers on foreign tables

2014-01-07 Thread Ronan Dunklau

Hello.

Since the last discussion about it 
(http://www.postgresql.org/message-id/cadyhksugp6ojb1pybtimop3s5fg_yokguto-7rbcltnvaj5...@mail.gmail.com),
 I
finally managed to implement row triggers for foreign tables.

The attached patch does not contain regression tests or documentation yet, a
revised patch will include those sometime during the week. I'll add it to the
commitfest then.

A simple test scenario using postgres_fdw is however included as an
attachment.

The attached patch implements triggers for foreign tables according to the
design discussed in the link above.

For statement-level triggers, nothing has changed since the last patch.
Statement-level triggers are just allowed in the command checking.

For row-level triggers, it works as follows:

- during rewrite phase, every attribute of the foreign table is duplicated as
a resjunk entry if a trigger is defined on the relation. These entries are
then used to store the old values for a tuple. There is room for improvement
here: we could check if any trigger is in fact a row-level trigger
- during execution phase, the before triggers are fired exactly like triggers
on regular tables, except that old tuples are not fetched using their ctid,
but rebuilt using the previously-stored resjunk attributes.
- for after triggers, the whole queuing mechanism is bypassed for foreign
tables. This is IMO acceptable, since foreign tables cannot have constraints
or constraints triggers, and thus have not need for deferrable execution. This
design avoids the need for storing and retrieving/identifying remote tuples
until the query or transaction end.
- the duplicated resjunk attributes are identified by being:
  - marked as resjunk in the targetlist
  - not being system or whole-row attributes (varno  0)

There is still one small issue with the attached patch: modifications to the
tuple performed by the foreign data wrapper (via the returned TupleTableSlot
in ExecForeignUpdate and ExecForeignInsert hooks) are not visible to the AFTER
trigger. This could be fixed by merging the planslot containing the resjunk
columns with the returned slot before calling the trigger, but I'm not really
sure how to safely perform that. Any advice ?

Many thanks to Kohei Kaigai for taking the time to help with the design.

--
Ronan Dunklau
http://dalibo.com - http://dalibo.org

test_with_postgres_fdw.sql
Description: application/sql
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b9cd88d..a509595 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -3150,13 +3150,16 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			/* No command-specific prep needed */
 			break;
 		case AT_EnableTrig:		/* ENABLE TRIGGER variants */
-		case AT_EnableAlwaysTrig:
-		case AT_EnableReplicaTrig:
 		case AT_EnableTrigAll:
 		case AT_EnableTrigUser:
 		case AT_DisableTrig:	/* DISABLE TRIGGER variants */
 		case AT_DisableTrigAll:
 		case AT_DisableTrigUser:
+			ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE);
+			pass = AT_PASS_MISC;
+			break;
+		case AT_EnableReplicaTrig:
+		case AT_EnableAlwaysTrig:
 		case AT_EnableRule:		/* ENABLE/DISABLE RULE variants */
 		case AT_EnableAlwaysRule:
 		case AT_EnableReplicaRule:
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4eff184..12870eb 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -75,6 +75,14 @@ static HeapTuple GetTupleForTrigger(EState *estate,
    ItemPointer tid,
    LockTupleMode lockmode,
    TupleTableSlot **newSlot);
+
+static HeapTuple ExtractOldTuple(TupleTableSlot * mixedtupleslot,
+	ResultRelInfo * relinfo);
+static void ExecARForeignTrigger(EState * estate,
+TriggerData LocTriggerData,
+ResultRelInfo *relinfo,
+int triggerEvent, int triggerType);
+
 static bool TriggerEnabled(EState *estate, ResultRelInfo *relinfo,
 			   Trigger *trigger, TriggerEvent event,
 			   Bitmapset *modifiedCols,
@@ -184,12 +192,22 @@ CreateTrigger(CreateTrigStmt *stmt, const char *queryString,
 			RelationGetRelationName(rel)),
 	 errdetail(Views cannot have TRUNCATE triggers.)));
 	}
+	else if (rel-rd_rel-relkind == RELKIND_FOREIGN_TABLE)
+	{
+		if(stmt-isconstraint)
+			ereport(ERROR,
+	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+	 errmsg(\%s\ is a foreign table,
+			RelationGetRelationName(rel)),
+	 errdetail(Foreign Tables cannot have constraint triggers.)));
+	}
 	else
+	{
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
  errmsg(\%s\ is not a table or view,
 		RelationGetRelationName(rel;
-
+	}
 	if (!allowSystemTableMods  IsSystemRelation(rel))
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
@@ -1062,10 +1080,11 @@ RemoveTriggerById(Oid trigOid)
 	rel = heap_open(relid, AccessExclusiveLock);

 	if (rel-rd_rel-relkind != RELKIND_RELATION 
-		rel-rd_rel-relkind != RELKIND_VIEW)
+		rel-rd_rel-relkind !=

Re: [HACKERS] ERROR: missing chunk number 0 for toast value

2014-01-07 Thread Andres Freund

On 2014-01-07 10:45:24 +0200, Heikki Linnakangas wrote:
 Overall, I'm leaning towards biting the bullet and always detoasting
 everything in master. Probably best to just leave the stable branches alone.

If we're doing something coarse grained as this, I agree, it should be
master only.

I personally vote to rather just leave things as is, seems better than
this pessimization, and it's not like loads of people have hit the issue.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-01-07 Thread Christian Ullrich


Hello all,

when pg_ctl start is used to run PostgreSQL in a console window on 
Windows, it runs in the background (it is terminated by closing the 
window, but that is probably inevitable). There is one problem, however: 
The first Ctrl-C in that window, no matter in which situation, will 
cause the background postmaster to exit. If you, say, ping something, 
and press Ctrl-C to stop ping, you probably don't want the database to 
go away, too.


The reason is that Windows delivers the Ctrl-C event to all processes 
using that console, not just to the foreground one.


Here's a patch to fix that. pg_ctl stop still works, and it has no 
effect when running as a service, so it should be safe. It starts the 
postmaster in a new process group (similar to calling setpgrp() after 
fork()) that does not receive Ctrl-C events from the console window.


--
Christian
diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
new file mode 100644
index 50d4586..89a9544
*** a/src/bin/pg_ctl/pg_ctl.c
--- b/src/bin/pg_ctl/pg_ctl.c
*** CreateRestrictedProcess(char *cmd, PROCE
*** 1561,1566 
--- 1561,1567 
HANDLE  restrictedToken;
SID_IDENTIFIER_AUTHORITY NtAuthority = {SECURITY_NT_AUTHORITY};
SID_AND_ATTRIBUTES dropSids[2];
+   DWORD   flags;
  
/* Functions loaded dynamically */
__CreateRestrictedToken _CreateRestrictedToken = NULL;
*** CreateRestrictedProcess(char *cmd, PROCE
*** 1636,1642 
AddUserToTokenDacl(restrictedToken);
  #endif
  
!   r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, 
CREATE_SUSPENDED, NULL, NULL, si, processInfo);
  
Kernel32Handle = LoadLibrary(KERNEL32.DLL);
if (Kernel32Handle != NULL)
--- 1637,1650 
AddUserToTokenDacl(restrictedToken);
  #endif
  
!   flags = CREATE_SUSPENDED;
! 
!   /* Protect console process from Ctrl-C */
!   if (!as_service) {
!   flags |= CREATE_NEW_PROCESS_GROUP;
!   }
! 
!   r = CreateProcessAsUser(restrictedToken, NULL, cmd, NULL, NULL, TRUE, 
flags, NULL, NULL, si, processInfo);
  
Kernel32Handle = LoadLibrary(KERNEL32.DLL);
if (Kernel32Handle != NULL)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic pseudotype IO functions?

2014-01-07 Thread Andres Freund

On 2014-01-06 11:56:28 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I think I am less concerned about pseudotypes.c than about bloating
  pg_proc.h even further and about the annoyance of editing it - but I
  guess that should rather be fixed by storing it in a more sensible
  format at some point...
 
 Yeah, getting rid of a dozen pseudotype I/O functions is hardly going
 to reduce the PITA factor of editing pg_proc.h.  It's interesting to
 think about moving all those DATA() macros into some more-maintainable
 format --- I'm not sure what it should be exactly, but I think something
 that can insert plausible defaults for omitted columns would be a big help
 for pg_proc and maybe some of the other catalogs too.

Alvaro previously suggested storing pg_proc in json. Not sure I like it,
but it'd sure be better than the current format if we derive unspecified
values from other columns (e.g. prorows = 0 iff proretset).

I think we also should auto-assign the oids for pg_proc (and some other
tables) rows if we go there. Afaics there's really not much reason to
keep them stable and it's by far the most frequent conflict I have seen
with keeping patches up2date.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dynamic shared memory and locks

2014-01-07 Thread Andres Freund

On 2014-01-06 21:35:22 -0300, Alvaro Herrera wrote:
 Jim Nasby escribió:
  On 1/6/14, 2:59 PM, Robert Haas wrote:
  On Mon, Jan 6, 2014 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  The point I'm making is that no such code should get past review,
  whether it's got an obvious performance problem or not.
  
  Sure, I agree, but we all make mistakes.  It's just a judgement call
  as to how likely you think it is that someone might make this
  particular mistake, a topic upon which opinions may vary.

I don't think it's that unlikely as the previous implementation's rules
when viewed while squinting allowed nesting spinlocks. And it's a pretty
simple check.

 Maybe it makes sense to have such a check #ifdef'ed out on most builds
 to avoid extra overhead, but not having any check at all just because we
 trust the review process too much doesn't strike me as the best of
 ideas.

I don't think that check would have relevantly high performance impact
in comparison to the rest of --enable-cassert - it's a single process
local variable which is regularly accessed. It will just stay in
L1 or even registers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ERROR: missing chunk number 0 for toast value

2014-01-07 Thread Florian Pflug

On Jan7, 2014, at 09:45 , Heikki Linnakangas hlinnakan...@vmware.com wrote:
 Overall, I'm leaning towards biting the bullet and always detoasting 
 everything in master. Probably best to just leave the stable branches alone.

+1

The fact that de-TOAST-ing can happen lazily is, at least to me, an
implementation detail that shouldn't be observable. If we want to
allow people to use lazy de-TOAST-ing as an optimization tool, we
should provide an explicit way to do so, e.g. by flagging variables
in pl/pgsql as REFERENCE or something like that.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Upgrade to Autoconf 2.69

2014-01-07 Thread Peter Eisentraut

That was probably me. I'll look into it.



 On Jan 6, 2014, at 11:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
 Bruce Momjian br...@momjian.us writes:
 On Sun, Dec 29, 2013 at 02:48:21AM -0500, Tom Lane wrote:
 3. pg_upgrade ignores the fact that pg_resetxlog failed, and keeps going.
 
 Does pg_resetxlog return a non-zero exit status?  If so, pg_upgrade
 should have caught that and exited.
 
 It certainly does:
 
if (errno)
{
fprintf(stderr, _(%s: could not read from directory \%s\: %s\n),
progname, XLOGDIR, strerror(errno));
exit(1);
}
 
 The bug is that pg_upgrade appears to assume (in many places not just this
 one) that exec_prog() will abort if the called program fails, but *it
 doesn't*, contrary to the claim in its own header comment.  This is
 because pg_log(FATAL, ...) doesn't call exit().  pg_fatal() does, but
 that's not what's being called in the throw_error case.
 
 I imagine that this used to work correctly and got broken by some
 ill-advised refactoring, but whatever the origin, it's 100% broken today.
 
regards, tom lane
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] multibyte messages are displayed incorrectly on the client

2014-01-07 Thread MauMau


From: Bruce Momjian br...@momjian.us

On Sun, Jan  5, 2014 at 04:40:17PM +0900, MauMau wrote:

Then, as a happy medium, how about disabling message localization
only if the client encoding differs from the server one?  That is,
compare the client_encoding value in the startup packet with the
result of GetPlatformEncoding().  If they don't match, call
disable_message_localization().


I think the problem is we don't know the client and server encodings
at that time.


I suppose we know (or at least believe) those encodings during backend 
startup:


* client encoding - the client_encoding parameter passed in the startup 
packet, or if that's not present, client_encoding GUC value.


* server encoding - the encoding of strings gettext() returns.  That is what 
GetPlatformEncoding() returns.


Am I correct?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] extra_float_digits and casting from real to numeric

2014-01-07 Thread Christoph Berg

A customer recently upgraded their jdbc driver from 8.4 to 9.2. This
enabled the binary wire protocol (effectively between 9.1 and 9.2).
They reported that single precision values inserted into a
numeric(10,2) column were suddenly rounded wrongly, i.e. 1.18 was
inserted as 1.20, while that worked before. Of course we told them
that single is the wrong data type for this, but still, this is a
regression.

The behavior is easily reproducible with SELECT 1.18::real which
returns 1.2. Now, the jdbc driver sets extra_float_digits = 3,
which makes the this ::real cast return 1.1797 in psql. This is
consistent with the documentation which suggests that
extra_float_digits = 0 will return the same representation on all
platforms, so it must be rounded a bit to account for different
implementations.

But if extra_float_digits  0 is set, I'd expect not only the float4
output to be affected by it, but also casts to other datatypes, which
is not the case now:

set extra_float_digits = 0;
select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, 
to_char(1.18::real, '9D999');
 float4  | numeric  |  text   | to_char  
-+--+-+--
 1.2 | 1.20 | 1.2 |  1.2

set extra_float_digits = 1;
select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, 
to_char(1.18::real, '9D999');
  float4  | numeric  |   text   | to_char  
--+--+--+--
 1.18 | 1.20 | 1.18 |  1.2

set extra_float_digits = 3;
select 1.18::real, 1.18::real::numeric(10,2), 1.18::real::text, 
to_char(1.18::real, '9D999');
   float4   | numeric  |text| to_char  
+--++--
 1.1797 | 1.20 | 1.1797 |  1.2

Is that sane? Shouldn't FLT_DIG in float4_numeric() be replaced with
FLT_DIG + extra_float_digits like float4out() does, so the extra
precision is not lost when inserting float4 data into numeric columns?
Likewise, float4_to_char() should be adjusted for to_char output, and
correspondingly float8_numeric() and float8_to_char()?

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic pseudotype IO functions?

2014-01-07 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 I think we also should auto-assign the oids for pg_proc (and some other
 tables) rows if we go there.

-1 ... you've evidently not built any opclasses lately.

Yeah, we could probably improve the bootstrap infrastructure enough to not
need literal OIDs in so many places in the initial catalog contents, but
it'd be lots of trouble.  It definitely is *not* something to buy into at
the same time as redoing the creation of the .bki file.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cleanup in code

2014-01-07 Thread Tom Lane

David Rowley dgrowle...@gmail.com writes:
 I think it will be like Andres said up thread, to stop multiple evaluations
 of the expression passed to the macro.

Exactly.  We are not going to risk multiple evals in a macro as commonly
used as elog/ereport; the risk/benefit ratio is just too high.

I don't see anything wrong with suppressing this warning by inserting
an additional return statement.  The code is already plastered with such
things, from the days before we had any unreachability hints in
elog/ereport.  And as I said upthread, there is no good reason to suppose
that the unreachability hints are always recognized by every compiler.
I take this behavior of MSVC as proof of that statement.

It is mildly curious that MSVC fails to understand the unreachability hint
here when it does so elsewhere, but for our purposes, that's a purely
academic question.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic pseudotype IO functions?

2014-01-07 Thread Andres Freund

On 2014-01-07 10:04:33 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I think we also should auto-assign the oids for pg_proc (and some other
  tables) rows if we go there.
 
 -1 ... you've evidently not built any opclasses lately.

True. Not sure if I ever built one, but when playing around.

To the point that I am not seing the problem right now. I am not
proposing to get rid of statically assigned oids in pg_type et al.. The
references to procs mostly seem to be typed 'regproc' so there aren't
many references to function's oids. There's a few procs that are
involved in bootstraping, those likely would need to keep a fixed oid,
but that doesn't sound problematic to me.

 Yeah, we could probably improve the bootstrap infrastructure enough to not
 need literal OIDs in so many places in the initial catalog contents, but
 it'd be lots of trouble.  It definitely is *not* something to buy into at
 the same time as redoing the creation of the .bki file.

Not sure. Any such infrastructure should have support for filling in
defaults for things that don't need to be specified - imo generating an
oid for that sounds like it would fit in there. Doesn't - and possibly
shouldn't - be the same commit though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-07 Thread Tom Lane

Heikki Linnakangas hlinnakan...@vmware.com writes:
 Another idea would be to do something like chroot, but more lightweight, 
 using FUSE, private mount namespaces, or cgroups.

I thought the goal here was to have a testing framework that (a) is
portable to every platform we support and (b) doesn't require root
privileges to run.  None of those options sound like they'll help meet
those requirements.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-07 Thread Andres Freund

On 2014-01-07 10:27:14 -0500, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  Another idea would be to do something like chroot, but more lightweight, 
  using FUSE, private mount namespaces, or cgroups.
 
 I thought the goal here was to have a testing framework that (a) is
 portable to every platform we support and (b) doesn't require root
 privileges to run.  None of those options sound like they'll help meet
 those requirements.

Seconded.

Perhaps the solution is to simply introduce tablespaces located relative
to PGDATA? That'd be fracking useful anyway.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now

2014-01-07 Thread Robert Haas

On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote:
 Issue #10 is now available!

What does any of this have to do with the purpose of the pgsql-hackers
mailing list, namely development of the PostgreSQL database server?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now

2014-01-07 Thread Thom Brown

On 7 January 2014 15:47, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote:
 Issue #10 is now available!

 What does any of this have to do with the purpose of the pgsql-hackers
 mailing list, namely development of the PostgreSQL database server?

We had a similar email from this same address over a year ago.  These
have been the only two messages posted, and neither relevant or
appropriate.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic pseudotype IO functions?

2014-01-07 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 To the point that I am not seing the problem right now. I am not
 proposing to get rid of statically assigned oids in pg_type et al.. The
 references to procs mostly seem to be typed 'regproc' so there aren't
 many references to function's oids.

*Some* of them are typed regproc.  Many are not, because there's need to
refer to overloaded function names.

A minimum change before we could even consider abandoning auto assignment
of pg_proc entries would be to make regprocedure_in work in bootstrap
mode, so that we could use regprocedure as a catalog column type.  And
regoperator_in, too, if you wanted to auto-assign operator OIDs.  And
you'd need some solution for generating fmgroids.h, as well as the
handmade #define's for selected operator OIDs that appear now in
pg_operator.h.  There are probably more issues, but those are the ones
that occur to me after thirty seconds' thought.

Even after all that, we can *not* go over to auto-assignment of pg_type
OIDs, because there is way too much stuff that assumes those OIDs are
stable (starting with on-disk storage of arrays).  I'm not entirely
sure that it's okay to have function OIDs varying across releases, either;
who's to say that there's not client code looking at fmgroids.h, or
otherwise hard-wiring the numbers it uses for fastpath function calls?

On the whole I think that this'd be a lot more work, and a lot more
potential for breakage, than it's really worth.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Kevin Grittner

AK alk...@gmail.com wrote:

 I cannot have a read-only transaction fail because of
 serialization anomalies. Can someone show me a working example
 please?

A common case is a read-only transaction reading a closed batch
without seeing all of its entries.

http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA

2014-01-07 Thread Magnus Hagander

On Thu, Jan 2, 2014 at 2:06 PM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Magnus Hagander mag...@hagander.net writes:
  We can't get away with just comparing the relative part of the pathname.
  Because it will fail if there is another path with exactly the same
 length,
  containing the tablespace.

 Actually… yeah.

  I think we might want to store a value in the tablespaceinfo struct
  indicating whether it's actually inside PGDATA (since we have the full
 path
  at that point), and then skip it based on that instead. Or store and pass
  the value of getcwd() perhaps.

 I think it's best to stuff in the tablespaceinfo struct either NIL or
 the relative path of the tablespace when found in $PGDATA, as done in
 the attached.

  I've attached a slightly updated patch - I changed around a bit of logic
  order and updated some comments during my review. And added
 error-checking.

 Thanks! I started again from your version for v3.


Applied a fairly heavily edited version of this one. I also backpatched it
to 9.1 and up.

Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-07 Thread Robert Haas

On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Robert Haas wrote:
 I spent some time looking at this tonight.  I don't think the value that
 is displayed for the bitmap memory tracking will be accurate in complex
 cases.  The bitmap heap scan may sit on top of one or more bitmap-and or
 bitmap-or nodes.  When a bitmap-and operation happens, one of the two
 bitmaps being combined will be thrown out and the number of entries in the
 other map will, perhaps, be decreased.  The peak memory usage for the
 surviving bitmap will be reflected in the number displayed for the bitmap
 heap scan, but the peak memory usage for the discarded bitmap will not.
 This is wholly arbitrary because both bitmaps existed at the same time,
 side by side, and which one we keep and which one we throw out is
 essentially
 random.

 Thank you for taking time to look at this patch.  The peak memory usage for
 the discarded bitmap *can* be reflected in the number displayed for the
 bitmap heap scan by the following code in tbm_union() or tbm_intersect():

   tbm_union(TIDBitmap *a, const TIDBitmap *b)
   {
 Assert(!a-iterating);
 +   if (a-nentriesPeak  b-nentriesPeak)
 +   a-nentriesPeak = b-nentriesPeak;
 /* Nothing to do if b is empty */
 if (b-nentries == 0)
 return;
 ***

   tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
   {
 Assert(!a-iterating);
 +   if (a-nentriesPeak  b-nentriesPeak)
 +   a-nentriesPeak = b-nentriesPeak;
 /* Nothing to do if a is empty */
 if (a-nentries == 0)
 return;
 ***

 Sorry for the delay.

Hmm, fair point.  But I'm still not convinced that we really need to
add extra accounting for this.  What's wrong with just reporting the
number of exact and lossy pages?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] generic pseudotype IO functions?

2014-01-07 Thread Andres Freund

On 2014-01-07 11:08:22 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  To the point that I am not seing the problem right now. I am not
  proposing to get rid of statically assigned oids in pg_type et al.. The
  references to procs mostly seem to be typed 'regproc' so there aren't
  many references to function's oids.
 
 *Some* of them are typed regproc.  Many are not, because there's need to
 refer to overloaded function names.

So, for the archives, this seems to be:
* pg_cast
* pg_aggregate
* pg_event_trigger
* pg_foreign_data_wrapper
* pg_language
* pg_proc (duh...)

It strikes me that several of the tables using reproc could very well
stand to use regprocedure instead.

 A minimum change before we could even consider abandoning auto assignment
 of pg_proc entries would be to make regprocedure_in work in bootstrap
 mode, so that we could use regprocedure as a catalog column type.

Yes.

 And regoperator_in, too, if you wanted to auto-assign operator OIDs.

I personally find that much less interesting because there are so much
fewer operators in comparison to procs, so conflicts are rarer.

 And
 you'd need some solution for generating fmgroids.h, as well as the
 handmade #define's for selected operator OIDs that appear now in
 pg_operator.h.

If we're able to generate the .bki file, this seems like a solveable problem.

 There are probably more issues, but those are the ones
 that occur to me after thirty seconds' thought.

Yes.

 Even after all that, we can *not* go over to auto-assignment of pg_type
 OIDs, because there is way too much stuff that assumes those OIDs are
 stable (starting with on-disk storage of arrays).

Yes, I think that's totally out of question.

 I'm not entirely
 sure that it's okay to have function OIDs varying across releases, either;
 who's to say that there's not client code looking at fmgroids.h, or
 otherwise hard-wiring the numbers it uses for fastpath function calls?

I am not particularly worried about that. That'd mean somebody built a
solution specifically for calling builtin functions since all user
created functions will be dynamic. And that that client is using a
fmgroids.h from another release than the one he's working with - a
recipe for trouble in such a solution independent of this imo.

 On the whole I think that this'd be a lot more work, and a lot more
 potential for breakage, than it's really worth.

That might be true. It's probably the most frequent cause for conflicts
in patches people submit tho...

Anyway, pg_proc's contents would need to be generated before this
anyway...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in base backup code path

2014-01-07 Thread Magnus Hagander

On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:
  On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  
   Hi Magnus,
  
   It looks to me like the path to do_pg_start_backup() outside of a
   transaction context comes from your initial commit of the base backup
   facility.
   The problem is that you're not allowed to do anything leading to a
   syscache/catcache lookup in those contexts.
 
  I think that may have come with the addition of the replication privilege
  actually but that doesn't change the fact that yes, it appears broken..

 There was a if (!superuser()) check there before as well, that has the
 same dangers.


I think the correct fix is to move the security check outside of
do_pg_start_backup() and leave it to the caller. That means
pg_start_backup() for a manual backup. And for a streaming base backup the
check has already been made - you can't get through the authentication step
if you don't have the required permissions.

Does the attached seem right to you?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 43a0416..bf1174e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9727,6 +9727,9 @@ XLogFileNameP(TimeLineID tli, XLogSegNo segno)
  *
  * Every successfully started non-exclusive backup must be stopped by calling
  * do_pg_stop_backup() or do_pg_abort_backup().
+ *
+ * It is the responsibility of the caller of this function to verify the
+ * permissions of the calling user!
  */
 XLogRecPtr
 do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
@@ -9747,11 +9750,6 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 
 	backup_started_in_recovery = RecoveryInProgress();
 
-	if (!superuser()  !has_rolreplication(GetUserId()))
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		   errmsg(must be superuser or replication role to run a backup)));
-
 	/*
 	 * Currently only non-exclusive backup can be taken during recovery.
 	 */
@@ -10053,6 +10051,9 @@ pg_start_backup_callback(int code, Datum arg)
  *
  * Returns the last WAL position that must be present to restore from this
  * backup, and the corresponding timeline ID in *stoptli_p.
+ *
+ * It is the responsibility of the caller of this function to verify the
+ * permissions of the calling user!
  */
 XLogRecPtr
 do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
@@ -10085,11 +10086,6 @@ do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p)
 
 	backup_started_in_recovery = RecoveryInProgress();
 
-	if (!superuser()  !has_rolreplication(GetUserId()))
-		ereport(ERROR,
-(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
-		 (errmsg(must be superuser or replication role to run a backup;
-
 	/*
 	 * Currently only non-exclusive backup can be taken during recovery.
 	 */
diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index c421a59..4be31b4 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -56,6 +56,11 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
+	if (!superuser()  !has_rolreplication(GetUserId()))
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		   errmsg(must be superuser or replication role to run a backup)));
+
 	startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
@@ -82,6 +87,11 @@ pg_stop_backup(PG_FUNCTION_ARGS)
 	XLogRecPtr	stoppoint;
 	char		stopxlogstr[MAXFNAMELEN];
 
+	if (!superuser()  !has_rolreplication(GetUserId()))
+		ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+		 (errmsg(must be superuser or replication role to run a backup;
+
 	stoppoint = do_pg_stop_backup(NULL, true, NULL);
 
 	snprintf(stopxlogstr, sizeof(stopxlogstr), %X/%X,

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extra_float_digits and casting from real to numeric

2014-01-07 Thread Tom Lane

Christoph Berg christoph.b...@credativ.de writes:
 A customer recently upgraded their jdbc driver from 8.4 to 9.2. This
 enabled the binary wire protocol (effectively between 9.1 and 9.2).
 They reported that single precision values inserted into a
 numeric(10,2) column were suddenly rounded wrongly, i.e. 1.18 was
 inserted as 1.20, while that worked before. Of course we told them
 that single is the wrong data type for this, but still, this is a
 regression.

I'm not sure that it's fair to characterize that as a regression.
If anything, it's more sensible than what happened before.

 But if extra_float_digits  0 is set, I'd expect not only the float4
 output to be affected by it, but also casts to other datatypes,

This proposal scares me.  extra_float_digits is strictly a matter of
I/O representation, it does not affect any internal calculations.
Moreover, since one of the fundamental attributes of type numeric
is that it's supposed to give platform-independent results, I don't
like the idea that you're likely to get platform-dependent results
of conversions from float4/float8.

I think your customer got bit by his own bad coding practice, and
that should be the end of it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in base backup code path

2014-01-07 Thread Andres Freund

On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote:
 On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:
   On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com
  wrote:
   
Hi Magnus,
   
It looks to me like the path to do_pg_start_backup() outside of a
transaction context comes from your initial commit of the base backup
facility.
The problem is that you're not allowed to do anything leading to a
syscache/catcache lookup in those contexts.
  
   I think that may have come with the addition of the replication privilege
   actually but that doesn't change the fact that yes, it appears broken..
 
  There was a if (!superuser()) check there before as well, that has the
  same dangers.
 
 
 I think the correct fix is to move the security check outside of
 do_pg_start_backup() and leave it to the caller. That means
 pg_start_backup() for a manual backup. And for a streaming base backup the
 check has already been made - you can't get through the authentication step
 if you don't have the required permissions.

Yes, that's what I was thinking and was planning to write you about at
some point.

 Does the attached seem right to you?

I haven't run the code, but it looks right to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Changeset Extraction Interfaces

2014-01-07 Thread Andres Freund

On 2013-12-12 16:49:33 +0100, Andres Freund wrote:
 On 2013-12-12 10:01:21 -0500, Robert Haas wrote:
  On Thu, Dec 12, 2013 at 7:04 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   As far as I have been thinking of, this would be another catalog table 
   like
   pg_decoding_plugin(oid, dpname name, dpload regproc).
  
  Instead of adding another catalog table, I think we should just define
  a new type.  Again, please look at the way that foreign data wrappers
  do this:
 
 I don't really see what the usage of a special type has to do with this,
 but I think that's besides your main point. What you're saying is that
 the output plugin is just defined by a function name, possibly schema
 prefixed. That has an elegance to it. +1

Ok, so I've implemented this, but I am not so sure it's sufficient,
there's some issue:
Currently a logical replication slot has a plugin assigned, previously
that has just been identified by the basename of a .so. But with the
above proposal the identifier is pointing to a function, currently via
its oid. But what happens if somebody drops or recreates the function?
We can't make pg_depend entries or anything since that won't work on a
standby.
Earlier, if somebody removed the .so we'd just error out, but pg's
dependency tracking always only mattered to things inside the catalogs.

I see the following possible solutions for this:

1) accept that fact, and throw an error if the function doesn't exist
anymore, or has an unsuitable signature. We can check the return type of
output_plugin_callbacks, so that's a pretty specific test.

2) Create a pg_output_plugin catalog and prevent DROP OUTPUT PLUGIN (or
similar) when there's a slot defined. But how'd that work if the slot is
only defined on standbys? We could have the redo routine block and/or
kill the slot if necessary?

3) Don't assign a specific output plugin to a slot, but have it
specified everytime data is streamed, not just when a slot is
created. Currently that wouldn't be a problem, but I am afraid it will
constrict some future optimizations.

Good ideas?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure in base backup code path

2014-01-07 Thread Magnus Hagander

On Tue, Jan 7, 2014 at 5:43 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-01-07 17:40:07 +0100, Magnus Hagander wrote:
  On Tue, Dec 24, 2013 at 1:24 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
   On 2013-12-23 18:28:51 +0100, Magnus Hagander wrote:
On Dec 19, 2013 12:06 AM, Andres Freund and...@2ndquadrant.com
   wrote:

 Hi Magnus,

 It looks to me like the path to do_pg_start_backup() outside of a
 transaction context comes from your initial commit of the base
 backup
 facility.
 The problem is that you're not allowed to do anything leading to a
 syscache/catcache lookup in those contexts.
   
I think that may have come with the addition of the replication
 privilege
actually but that doesn't change the fact that yes, it appears
 broken..
  
   There was a if (!superuser()) check there before as well, that has the
   same dangers.
  
  
  I think the correct fix is to move the security check outside of
  do_pg_start_backup() and leave it to the caller. That means
  pg_start_backup() for a manual backup. And for a streaming base backup
 the
  check has already been made - you can't get through the authentication
 step
  if you don't have the required permissions.

 Yes, that's what I was thinking and was planning to write you about at
 some point.


Good, then we think the same :)


 Does the attached seem right to you?

 I haven't run the code, but it looks right to me.


It worked fine in my testing, so I've pushed that version.

Looks slightly different in both 9.2 and 9.1 (not clean backpatching) due
to code reorganization and such, but AFAICT it's just code in different
places.

Thanks!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

[HACKERS] Failed assertion root-hasLateralRTEs on initsplan.c

2014-01-07 Thread Emre Hasegeli

I get assertion failure on initsplan.c line 1325 while executing following query
on HEAD (edc43458d797a5956f4bf39af18cf62abb0077db). It works fine
without --enable-cassert.

update subscriber set properties = hstore(a) from (select firstname,
lastname from player where player.id = subscriber.id) as a;

Backtrace:

* thread #1: tid = 0x2e16ec, 0x7fff85f0b866
libsystem_kernel.dylib`__pthread_kill + 10, queue =
'com.apple.main-thread, stop reason = signal SIGABRT
frame #0: 0x7fff85f0b866 libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff8450335c libsystem_pthread.dylib`pthread_kill + 92
frame #2: 0x7fff82ffdbba libsystem_c.dylib`abort + 125
frame #3: 0x00010e2b7510
postgres`ExceptionalCondition(conditionName=unavailable,
errorType=unavailable, fileName=unavailable,
lineNumber=unavailable) + 80 at assert.c:54
frame #4: 0x00010e155ab6
postgres`distribute_qual_to_rels(root=unavailable,
clause=0x7fd5c382e208, is_deduced='\0', below_outer_join='\0',
jointype=JOIN_INNER, qualscope=0x7fd5c3835ee8,
ojscope=unavailable, outerjoin_nonnullable=unavailable,
deduced_nullable_relids=unavailable,
postponed_qual_list=unavailable) + 1254 at initsplan.c:1325
frame #5: 0x00010e154a66
postgres`deconstruct_recurse(root=0x7fd5c382c248,
jtnode=0x7fd5c382cde0, below_outer_join='\0',
qualscope=0x7fff51c723f8, inner_join_rels=unavailable,
postponed_qual_list=0x7fff51c72400) + 870 at initsplan.c:781
frame #6: 0x00010e1548ab
postgres`deconstruct_recurse(root=0x7fd5c382c248,
jtnode=0x7fd5c382bfd8, below_outer_join='\0',
qualscope=0x7fff51c72450, inner_join_rels=0x7fff51c72448,
postponed_qual_list=0x7fff51c72440) + 427 at initsplan.c:732
frame #7: 0x00010e1546a1
postgres`deconstruct_jointree(root=unavailable) + 81 at
initsplan.c:655
frame #8: 0x00010e156a1b
postgres`query_planner(root=0x7fd5c382c248,
tlist=0x7fd5c382e398, qp_callback=0x00010e15a660,
qp_extra=0x7fff51c725f0) + 219 at planmain.c:145
frame #9: 0x00010e1589d8
postgres`grouping_planner(root=0x7fd5c382c248,
tuple_fraction=unavailable) + 2888 at planner.c:1243
frame #10: 0x00010e157adf
postgres`subquery_planner(glob=0x7fd5c4007e68,
parse=0x7fd5c4007a30, parent_root=unavailable,
hasRecursion=unavailable, tuple_fraction=0,
subroot=0x7fff51c72900) + 3119 at planner.c:572
frame #11: 0x00010e156cac
postgres`standard_planner(parse=0x7fd5c4007a30,
cursorOptions=unavailable, boundParams=unavailable) + 236 at
planner.c:210
frame #12: 0x00010e1d6356
postgres`pg_plan_query(querytree=0x7fd5c4007a30, cursorOptions=0,
boundParams=0x) + 118 at postgres.c:759
frame #13: 0x00010e1d979a postgres`PostgresMain [inlined]
pg_plan_queries(cursorOptions=0, querytrees=unavailable,
boundParams=unavailable) + 56 at postgres.c:818
frame #14: 0x00010e1d9762 postgres`PostgresMain [inlined]
exec_simple_query(query_string=0x7fd5c4006038) + 21 at
postgres.c:983
frame #15: 0x00010e1d974d
postgres`PostgresMain(argc=unavailable, argv=unavailable,
dbname=0x7fd5c301ac30, username=unavailable) + 8749 at
postgres.c:4011
frame #16: 0x00010e184c1f postgres`PostmasterMain [inlined]
BackendRun + 7551 at postmaster.c:4085
frame #17: 0x00010e184c00 postgres`PostmasterMain [inlined]
BackendStartup at postmaster.c:3774
frame #18: 0x00010e184c00 postgres`PostmasterMain [inlined]
ServerLoop at postmaster.c:1585
frame #19: 0x00010e184c00
postgres`PostmasterMain(argc=unavailable, argv=unavailable) + 7520
at postmaster.c:1240
frame #20: 0x00010e11924f postgres`main(argc=1,
argv=0x7fd5c2c03ec0) + 783 at main.c:194
frame #21: 0x7fff897165fd libdyld.dylib`start + 1
frame #22: 0x7fff897165fd libdyld.dylib`start + 1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Florian Pflug

On Jan7, 2014, at 00:38 , Jim Nasby j...@nasby.net wrote:
 This email and the previous one are an awesome bit of information,
 can we add it to the docs somehow? Even if it's just dumping the
 emails into a wiki page and referencing it?

Most of what I wrote there can be found in README-SSE, I think,
under Apparent Serial Order of Execution, Heap locking and
Index AM implementations.

I guess it'd be nice if we explained these things in the docs
somewhere, though I'm not sure what level of detail would be
appropriate. Maybe a good compromise would be to explain dependency
graphs, but skip over the different kinds of dependencies (ww, rw, wr).
Instead we could say that whenever a transaction *does see* another
transaction's modifications it must appear after that transaction in any
serial schedule, and whenever a transaction *might see* another
transaction's modifications but doesn't due to begin/commit ordering
it must appear before that transaction.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-07 Thread Matheus de Oliveira

Hi folks,


On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 lazy_vacuum_page() does this:

 1. START_CRIT_SECTION()
 2. Remove dead tuples from the page, marking the itemid's unused.
 3. MarkBufferDirty
 4. if all remaining tuples on the page are visible to everyone, set the
 all-visible flag on the page, and call visibilitymap_set() to set the VM
 bit.
 5 visibilitymap_set() writes a WAL record about setting the bit in the
 visibility map.
 6. write the WAL record of removing the dead tuples.
 7. END_CRIT_SECTION().

 See the problem? Setting the VM bit is WAL-logged first, before the
 removal of the tuples. If you stop recovery between the two WAL records,
 the page's VM bit in the VM map will be set, but the dead tuples are still
 on the page.

 This bug was introduced in Feb 2013, by commit
 fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master.

 The fix seems quite straightforward, we just have to change the order of
 those two records. I'll go commit that.



With a lot of help from Andres on IRC (thanks a lot man), I think (he
thinks actually, =P ) I may ran into the issue this bug can cause. I'm
sending this e-mail to (1) confirm if my issue is really caused by that bug
and if that is the case, (2) let you guys know the problems it can cause.

Scenario:

Master: 9.3.1 (I know, trying to go to 9.3.2)
Slave: 9.3.2

The slave was running with hot_standby=off (because of other bug Andres is
working on), but it stopped working with the following log lines:

2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING:  page 1076 of
relation base/883916/151040222 is uninitialized
2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT:  xlog redo
visible: rel 1663/883916/151040222; blk 1076
2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC:  WAL contains
references to invalid pages
2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT:  xlog redo
visible: rel 1663/883916/151040222; blk 1076
2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG:  startup process
(PID 22668) was terminated by signal 6: Aborted

(Complete log at https://pgsql.privatepaste.com/343f3190fe).

I used pg_xlogdump to (try to) find the block the error relates to:

$ pg_xlogdump pg_xlog/000102BC002B 000102BC007F |
grep -n -C5 '883916/151040222; blk 1076'

2088220 ... Heap2 ...  24/   192, ... lsn: 2BC/46AB8B20 ... desc: clean:
rel 1663/883916/151040222; blk 1073 remxid 107409880
2088221 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible:
rel 1663/883916/151040222; blk 1074
2088222 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8C18 ... desc: clean:
rel 1663/883916/151040222; blk 1074 remxid 107409880
2088223 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8C98 ... desc: visible:
rel 1663/883916/151040222; blk 1075
2088224 ... Heap2 ...  32/64, ... lsn: 2BC/46AB8CD0 ... desc: clean:
rel 1663/883916/151040222; blk 1075 remxid 107409880
== two lines that matched bellow ==
2088225 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8D10 ... desc: visible:
rel 1663/883916/151040222; blk 1076
2088226 ... Heap2 ...  24/   164, ... lsn: 2BC/46AB8D48 ... desc: clean:
rel 1663/883916/151040222; blk 1076 remxid 107409880
2088227 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8DF0 ... desc: visible:
rel 1663/883916/151040222; blk 1077
2088228 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8E28 ... desc: clean:
rel 1663/883916/151040222; blk 1077 remxid 107409880
2088229 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8EA8 ... desc: visible:
rel 1663/883916/151040222; blk 1078
2088230 ... Heap2 ...  24/  5748, ... lsn: 2BC/46AB8EE0 ... desc: clean:
rel 1663/883916/151040222; blk 1078 remxid 107409880
2088231 ... Heap2 ...  20/52, ... lsn: 2BC/46ABA570 ... desc: visible:
rel 1663/883916/151040222; blk 1079


I cropped some columns to make it easy to read (the entire result is
attached), if you guys need more information, I can send the xlogdump of
all the WALs (or any other information). Also attached the controldata, if
needed.


Thanks a lot.

Best regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
2088220-rmgr: Heap2   len (rec/tot): 24/   192, tx:  0, lsn: 2BC/46AB8B20, prev 2BC/46AB8AE8, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1073 remxid 107409880
2088221-rmgr: Heap2   len (rec/tot): 20/52, tx:  0, lsn: 2BC/46AB8BE0, prev 2BC/46AB8B20, bkp: , desc: visible: rel 1663/883916/151040222; blk 1074
2088222-rmgr: Heap2   len (rec/tot): 24/   128, tx:  0, lsn: 2BC/46AB8C18, prev 2BC/46AB8BE0, bkp: 1000, desc: clean: rel 1663/883916/151040222; blk 1074 remxid 107409880
2088223-rmgr: Heap2   len (rec/tot): 20/52, tx:  0, lsn: 2BC/46AB8C98, prev 2BC/46AB8C18, bkp: , desc: visible: rel 1663/883916/151040222; blk 1075
2088224-rmgr: Heap2   len (rec/tot): 32/64, tx:  0, lsn:

Re: [HACKERS] Fixing pg_basebackup with tablespaces found in $PGDATA

2014-01-07 Thread Dimitri Fontaine

Magnus Hagander mag...@hagander.net writes:
 Applied a fairly heavily edited version of this one. I also backpatched it
 to 9.1 and up.

Thanks a lot!

Did some reviewing and re-testing here, I like using DataDir and
IS_DIR_SEP better than what I did, of course ;-)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Failed assertion root-hasLateralRTEs on initsplan.c

2014-01-07 Thread Tom Lane

Emre Hasegeli e...@hasegeli.com writes:
 I get assertion failure on initsplan.c line 1325 while executing following 
 query
 on HEAD (edc43458d797a5956f4bf39af18cf62abb0077db). It works fine
 without --enable-cassert.

 update subscriber set properties = hstore(a) from (select firstname,
 lastname from player where player.id = subscriber.id) as a;

Hm, AFAICS this query should absolutely *not* work; the reference to
subscriber.id inside the sub-select is illegal.  It might be legal with
LATERAL, but not otherwise.  So I think this is a parser bug, and there's
nothing wrong with the planner's Assert.  9.2 and earlier throw the
error I'd expect, so probably something in the LATERAL patches broke
this case; will look.

The next question is if we should allow it with LATERAL.  That would
essentially be treating subscriber as having implicitly appeared at the
start of the FROM list, which I guess is all right ... but does anyone
want to argue against it?  I seem to recall some old discussions about
allowing the update target to be explicitly shown in FROM, in case you
wanted say to left join it against something else.  Allowing this implicit
appearance might limit our options if we ever get around to trying to do
that.  On the other hand, those discussions were a long time back, so
maybe it'll never happen anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Kevin Grittner

Dan Ports d...@csail.mit.edu wrote:
 On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote:

 If you try this, it'll cause a serialization failure on the line
 marked with a '!'. W1 saw (1,0) in the table, so W1 appears to
 have executed before W2. But R saw both (1,1) and (2,0) in the
 table, and that has to be a consistent snapshot of the database
 state, meaning W2 appears to have executed before W1. That's an
 inconsistency, so something has to be rolled back. This
 particular anomaly requires all three of the transactions, and so
 it can't be detected until W1 does its UPDATE. Postgres detects
 the conflict at that point and rolls back W1.

Yeah, neither of the provided examples rolled back the read only
transaction itself; the read only transaction caused a situation
where something needed to be rolled back, but since we try to roll
back a transaction which has a good chance of succeeding on retry,
the read only transaction is not usually a good candidate.  I
created a new example on the Wiki page where the read only
transaction itself must be rolled back because both of the other
transactions involved have already committed:

https://wiki.postgresql.org/wiki/SSI#Rollover

Regarding other questions on the thread:

I have no objections to moving the Wiki examples into the docs, but
it seemed like a lot to include, and I'm not sure where it belongs.
Ideas?

Regarding the different results AK got, I set
default_transaction_isolation = 'serializable' on my connections
before running these for two reasons.
(1)  It keeps the examples more concise.
(2)  I think most people using serializable transactions in
PostgreSQL set the default and don't set the transaction isolation
level on each transaction, since (unlike strategies which rely on
blocking, like S2PL) all transactions must be participating in the
stricter isolation level for it to be reliable.  In fact, given the
performance benefits of declaring transactions READ ONLY when
possible, I have seen shops that make *that* a default, too, and
override it for transactions which need to write.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-07 Thread Heikki Linnakangas


On 01/07/2014 07:15 PM, Matheus de Oliveira wrote:

Hi folks,


On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



lazy_vacuum_page() does this:

1. START_CRIT_SECTION()
2. Remove dead tuples from the page, marking the itemid's unused.
3. MarkBufferDirty
4. if all remaining tuples on the page are visible to everyone, set the
all-visible flag on the page, and call visibilitymap_set() to set the VM
bit.
5 visibilitymap_set() writes a WAL record about setting the bit in the
visibility map.
6. write the WAL record of removing the dead tuples.
7. END_CRIT_SECTION().

See the problem? Setting the VM bit is WAL-logged first, before the
removal of the tuples. If you stop recovery between the two WAL records,
the page's VM bit in the VM map will be set, but the dead tuples are still
on the page.

This bug was introduced in Feb 2013, by commit
fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master.

The fix seems quite straightforward, we just have to change the order of
those two records. I'll go commit that.




With a lot of help from Andres on IRC (thanks a lot man), I think (he
thinks actually, =P ) I may ran into the issue this bug can cause. I'm
sending this e-mail to (1) confirm if my issue is really caused by that bug
and if that is the case, (2) let you guys know the problems it can cause.

Scenario:

Master: 9.3.1 (I know, trying to go to 9.3.2)
Slave: 9.3.2

The slave was running with hot_standby=off (because of other bug Andres is
working on), but it stopped working with the following log lines:

2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING:  page 1076 of
relation base/883916/151040222 is uninitialized
2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT:  xlog redo
visible: rel 1663/883916/151040222; blk 1076
2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC:  WAL contains
references to invalid pages
2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT:  xlog redo
visible: rel 1663/883916/151040222; blk 1076
2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG:  startup process
(PID 22668) was terminated by signal 6: Aborted

(Complete log at https://pgsql.privatepaste.com/343f3190fe).

I used pg_xlogdump to (try to) find the block the error relates to:

$ pg_xlogdump pg_xlog/000102BC002B 000102BC007F |
grep -n -C5 '883916/151040222; blk 1076'

2088220 ... Heap2 ...  24/   192, ... lsn: 2BC/46AB8B20 ... desc: clean:
rel 1663/883916/151040222; blk 1073 remxid 107409880
2088221 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible:
rel 1663/883916/151040222; blk 1074
2088222 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8C18 ... desc: clean:
rel 1663/883916/151040222; blk 1074 remxid 107409880

 ...

Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' 
is incorrect, but I don't immediately see how that could cause the 
PANIC. Why is the page uninitialized in the standby? If VACUUM is 
removing some dead tuples from it, it certainly should exist and be 
correctly initialized.


How did you set up the standby? Did you initialize it from an offline 
backup of the master's data directory, perhaps? The log shows that the 
startup took the the crash recovery first, then start archive recovery 
path, because there was no backup label file. In that mode, the standby 
assumes that the system is consistent after replaying all the WAL in 
pg_xlog, which is correct if you initialize from an offline backup or 
atomic filesystem snapshot, for example. But WAL contains references to 
invalid pages could also be a symptom of an inconsistent base backup, 
cause by incorrect backup procedure. In particular, I have to ask 
because I've seen it before: you didn't delete backup_label from the 
backup, did you?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Kevin Grittner

AK alk...@gmail.com wrote:

 Session 1. Setting up:

 CREATE TABLE cars(
   license_plate VARCHAR NOT NULL,
   reserved_by VARCHAR NULL
 );
 INSERT INTO cars(license_plate)
 VALUES ('SUPRUSR'),('MIDLYPH');

 Session 2: W1

 BEGIN ISOLATION LEVEL SERIALIZABLE;

 UPDATE cars SET reserved_by = 'Julia'
   WHERE license_plate = 'SUPRUSR'
   AND reserved_by IS NULL;

 SELECT * FROM Cars
 WHERE license_plate IN('SUPRUSR','MIDLYPH');

 Session 3: W2

 BEGIN ISOLATION LEVEL SERIALIZABLE;

 UPDATE cars SET reserved_by = 'Ryan'
   WHERE license_plate = 'MIDLYPH'
   AND reserved_by IS NULL;

 COMMIT;

 Session 4: R

 BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY;

 SELECT * FROM Cars
 WHERE license_plate IN('SUPRUSR','MIDLYPH');

 Session 2: W1

 COMMIT;

 ERROR:  could not serialize access due to read/write dependencies
 among transactions

 What am I doing wrong?

Even without the read only transaction the W1 and W2 transactions
are a classic case of write skew.  It looks like it might actually
be benign, since neither transaction is updating license_plate, but
serializable logic works at the row level, not the column level.
After both transactions update the table there is write skew which
must be resolved by cancelling one of the transactions.  The first
to commit wins and the other one will be cancelled when it
attempts to run its next statement, which may or may not be a
COMMIT.

If, for purposes of demonstration, you add a unique index on
license_plate and set enable_seqscan = off, you eliminate the
simple write skew and get into more complex ways of breaking
things.  With that tweak you can run all of those transactions if
W1 skips the SELECT.  You can let W1 do the SELECT as long as you
don't run R.  The problem is that the SELECT in W1 sees the work of
W1 but not W2 and the SELECT in R sees the work of W2 but not W1. 
We can't allow that.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cleanup in code

2014-01-07 Thread Heikki Linnakangas


On 01/07/2014 05:20 PM, Tom Lane wrote:

David Rowley dgrowle...@gmail.com writes:

I think it will be like Andres said up thread, to stop multiple evaluations
of the expression passed to the macro.


Exactly.  We are not going to risk multiple evals in a macro as commonly
used as elog/ereport; the risk/benefit ratio is just too high.

I don't see anything wrong with suppressing this warning by inserting
an additional return statement.  The code is already plastered with such
things, from the days before we had any unreachability hints in
elog/ereport.  And as I said upthread, there is no good reason to suppose
that the unreachability hints are always recognized by every compiler.
I take this behavior of MSVC as proof of that statement.


Yeah, I was just surprised because I thought MSVC understood it. 
Committed the additional return statement.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread AK

Regarding this:  So what does this have to do with relying on the results
of read-only 
transactions that abort? Well, what if you had instead had R ROLLBACK 
instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to 
be equivalent for transactions that don't modify the database, or maybe 
because something else caused the transaction to abort? When W1 does 
its update, it will be checked for serialization failures, but aborted 
transactions are (intentionally) not included in those checks. W1 is 
therefore allowed to commit; the apparent serial order of execution is 
W1 followed by W2, and the results of the aborted transaction R aren't 
consistent with that. 

So if I am reading the data and then commit, I should be always fine,
correct?



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785757.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-01-07 Thread Tom Lane

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 The following modification to v7 does this.

 =
 diff --git a/src/backend/optimizer/path/pathkeys.c 
 b/src/backend/optimizer/path/pathkeys.c
 index 380f3ba..233e21c 100644
 --- a/src/backend/optimizer/path/pathkeys.c
 +++ b/src/backend/optimizer/path/pathkeys.c
 @@ -536,7 +536,8 @@ index_pathkeys_are_extensible(PlannerInfo *root,
   {
   EquivalenceMember *member = (EquivalenceMember *) 
 lfirst(lc2);
 
 - if (!bms_equal(member-em_relids, index-rel-relids))
 + if (!bms_equal(member-em_relids, index-rel-relids) ||
 + !IsA(member, Var))
   continue;
   else
   {
 ==

I'm pretty sure that IsA test prevents the optimization from ever firing.

But aside from hasty typos, is there enough left of this optimization to
be worth the complication?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [HITB-Announce] HITB Magazine Issue 10 Out Now

2014-01-07 Thread Alvaro Herrera

Thom Brown escribió:
 On 7 January 2014 15:47, Robert Haas robertmh...@gmail.com wrote:
  On Mon, Jan 6, 2014 at 9:37 PM, Hafez Kamal aph...@hackinthebox.org wrote:
  Issue #10 is now available!
 
  What does any of this have to do with the purpose of the pgsql-hackers
  mailing list, namely development of the PostgreSQL database server?
 
 We had a similar email from this same address over a year ago.  These
 have been the only two messages posted, and neither relevant or
 appropriate.

I have unregistered this address from postgresql.org mailing list.  I
guess we will have to wait a year to see if this has any effect ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-07 Thread Mark Dilger






On Tuesday, January 7, 2014 7:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
Heikki Linnakangas hlinnakan...@vmware.com writes:

 Another idea would be to do something like chroot, but more lightweight, 
 using FUSE, private mount namespaces, or cgroups.

I thought the goal here was to have a testing framework that (a) is
portable to every platform we support and (b) doesn't require root
privileges to run.  None of those options sound like they'll help meet
those requirements.

            regards, tom lane


If I drop the idea of sudo/chroot and punt for now on testing
tablespaces under replication, it should be possible to test
the rest of the replication system in a way that meets (a) and
(b).  Perhaps Andres' idea of tablespaces relative to the
data directory will get implemented some day, at which point
we wouldn't be punting quite so much.  But until then, punt.

Would it make sense for this to just be part of 'make check'?
That would require creating multiple database clusters under
multiple data directories, and having them bind to multiple
ports or unix domain sockets.  Is that a problem?

What's the logic of having replication testing separated from
the other pg_regress tests?  Granted, not every user of postgres
uses replication, but that's true for lots of features, and
we don't split things like json into separate test suites.
Vendors who run 'make check' as part of their packaging of
postgresql would probably benefit from knowing if replication
doesn't work on their distro, and they may not change their
packaging systems to include a second 'make replicationcheck'
step.

mark



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-core regression tests for replication, cascading, archiving, PITR, etc. Michael Paquier

2014-01-07 Thread Tom Lane

Mark Dilger markdil...@yahoo.com writes:
 Would it make sense for this to just be part of 'make check'?

Probably not, as (I imagine) it will take quite a bit longer than
make check does today.  People who are not working on replication
related features will be annoyed if a test cycle starts taking 10X
longer than it used to, for tests of no value to them.

It's already not the case that make check runs every available
automated test; the isolation tests, the PL tests, the contrib
tests are all separate.

There is a make check-world, which I think should reasonably run
all of these.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: How to reproduce serialization failure for a read only transaction.

2014-01-07 Thread Kevin Grittner

AK alk...@gmail.com wrote:

 So if I am reading the data and then commit, I should be always
 fine, correct?

If a serializable transaction successfully commits, that means that
all data read within that transaction can be trusted.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Andrew Gierth

 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Initial tests suggest that your version is ~40% slower than ours on
  some workloads.

 Tom I poked at this a bit with perf and oprofile, and concluded that
 Tom probably the difference comes from ordered_set_startup()
 Tom repeating lookups for each group that could be done just once
 Tom per query.

Retesting with your changes shows that the gap is down to 15% but still
present:

work_mem=64MB enable_hashagg=off  (for baseline test)

baseline query (333ms on both versions):
select count(*)
  from (select j from generate_series(1,3) i,
  generate_series(1,10) j group by j) s;

test query:
select count(*)
  from (select percentile_disc(0.5) within group (order by i)
  from generate_series(1,3) i,
   generate_series(1,10) j group by j) s;

On the original patch as supplied: 571ms - 333ms = 238ms
On current master: 607ms - 333ms = 274ms

Furthermore, I can't help noticing that the increased complexity has
now pretty much negated your original arguments for moving so much of
the work out of nodeAgg.c.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane

Andrew Gierth and...@tao11.riddles.org.uk writes:
 Retesting with your changes shows that the gap is down to 15% but still
 present:

There's probably some overhead from traversing advance_transition_function
for each row, which your version wouldn't have done.  15% sounds pretty
high for that though, since advance_transition_function doesn't have much
to do when the transition function is non strict and the state value is
pass-by-value (which internal is).  It's possible that we could do
something to micro-optimize that case.

I also wondered if it was possible to omit rechecking AggCheckCallContext
after the first time through in ordered_set_transition().  It seemed
unsafe to perhaps not have that happen at all, since if the point is to
detect misuse then the mere fact that argument 0 isn't null isn't much
comfort.  It strikes me though that now we could test for first call by
looking at fcinfo-flinfo-fn_extra, and be pretty sure that we've already
checked the context if that isn't null.  Whether this would save anything
noticeable isn't clear though; I didn't see AggCheckCallContext high in
the profile.

 Furthermore, I can't help noticing that the increased complexity has
 now pretty much negated your original arguments for moving so much of
 the work out of nodeAgg.c.

The key reason for that was, and remains, not having the behavior
hard-wired in nodeAgg; I believe that this design permits things to be
accomplished in aggregate implementation functions that would not have
been possible with the original patch.  I'm willing to accept some code
growth to have that flexibility.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-07 Thread Andres Freund

On 2014-01-07 21:36:31 +0200, Heikki Linnakangas wrote:
 2088220 ... Heap2 ...  24/   192, ... lsn: 2BC/46AB8B20 ... desc: clean:
 rel 1663/883916/151040222; blk 1073 remxid 107409880
 2088221 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible:
 rel 1663/883916/151040222; blk 1074
 2088222 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8C18 ... desc: clean:
 rel 1663/883916/151040222; blk 1074 remxid 107409880
  ...
 
 Hmm. The xlogdump indeed shows that the order of 'clean' and 'visi ble' is
 incorrect, but I don't immediately see how that could cause the PANIC. Why
 is the page uninitialized in the standby? If VACUUM is removing some dead
 tuples from it, it certainly should exist and be correctly initialized.

Yea, that's strange. I first thought it might be the PageIsNew() branch
in lazy_scan_heap(). That initializes the page without WAL logging which
would explain it being uninitialized on the standby. But that wouldn't
explain why we found something to clean on the primary while the page is
still empty on the standby...

 How did you set up the standby? Did you initialize it from an offline backup
 of the master's data directory, perhaps? The log shows that the startup took
 the the crash recovery first, then start archive recovery path, because
 there was no backup label file

Good point.

In particular, I have to ask because I've seen it before: you
 didn't delete backup_label from the backup, did you?

I have seen that repeatedly too.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Andrew Gierth

 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Furthermore, I can't help noticing that the increased complexity
  has now pretty much negated your original arguments for moving so
  much of the work out of nodeAgg.c.

 Tom The key reason for that was, and remains, not having the
 Tom behavior hard-wired in nodeAgg; I believe that this design
 Tom permits things to be accomplished in aggregate implementation
 Tom functions that would not have been possible with the original
 Tom patch.  I'm willing to accept some code growth to have that
 Tom flexibility.

Do you have an actual use case?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-07 Thread Matheus de Oliveira

On Tue, Jan 7, 2014 at 5:36 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 01/07/2014 07:15 PM, Matheus de Oliveira wrote:

 Hi folks,


 On Fri, Dec 13, 2013 at 9:47 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com

 wrote:


  lazy_vacuum_page() does this:

 1. START_CRIT_SECTION()
 2. Remove dead tuples from the page, marking the itemid's unused.
 3. MarkBufferDirty
 4. if all remaining tuples on the page are visible to everyone, set the
 all-visible flag on the page, and call visibilitymap_set() to set the VM
 bit.
 5 visibilitymap_set() writes a WAL record about setting the bit in the
 visibility map.
 6. write the WAL record of removing the dead tuples.
 7. END_CRIT_SECTION().

 See the problem? Setting the VM bit is WAL-logged first, before the
 removal of the tuples. If you stop recovery between the two WAL records,
 the page's VM bit in the VM map will be set, but the dead tuples are
 still
 on the page.

 This bug was introduced in Feb 2013, by commit
 fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and
 master.

 The fix seems quite straightforward, we just have to change the order of
 those two records. I'll go commit that.



 With a lot of help from Andres on IRC (thanks a lot man), I think (he
 thinks actually, =P ) I may ran into the issue this bug can cause. I'm
 sending this e-mail to (1) confirm if my issue is really caused by that
 bug
 and if that is the case, (2) let you guys know the problems it can cause.

 Scenario:

 Master: 9.3.1 (I know, trying to go to 9.3.2)
 Slave: 9.3.2

 The slave was running with hot_standby=off (because of other bug Andres is
 working on), but it stopped working with the following log lines:

 2014-01-07 12:38:04 BRST [22668]: [7-1] user=,db= WARNING:  page 1076 of
 relation base/883916/151040222 is uninitialized
 2014-01-07 12:38:04 BRST [22668]: [8-1] user=,db= CONTEXT:  xlog redo
 visible: rel 1663/883916/151040222; blk 1076
 2014-01-07 12:38:04 BRST [22668]: [9-1] user=,db= PANIC:  WAL contains
 references to invalid pages
 2014-01-07 12:38:04 BRST [22668]: [10-1] user=,db= CONTEXT:  xlog redo
 visible: rel 1663/883916/151040222; blk 1076
 2014-01-07 12:38:04 BRST [22665]: [3-1] user=,db= LOG:  startup process
 (PID 22668) was terminated by signal 6: Aborted

 (Complete log at https://pgsql.privatepaste.com/343f3190fe).

 I used pg_xlogdump to (try to) find the block the error relates to:

 $ pg_xlogdump pg_xlog/000102BC002B 000102BC007F |
 grep -n -C5 '883916/151040222; blk 1076'

 2088220 ... Heap2 ...  24/   192, ... lsn: 2BC/46AB8B20 ... desc: clean:
 rel 1663/883916/151040222; blk 1073 remxid 107409880
 2088221 ... Heap2 ...  20/52, ... lsn: 2BC/46AB8BE0 ... desc: visible:
 rel 1663/883916/151040222; blk 1074
 2088222 ... Heap2 ...  24/   128, ... lsn: 2BC/46AB8C18 ... desc: clean:
 rel 1663/883916/151040222; blk 1074 remxid 107409880

  ...

 Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is
 incorrect, but I don't immediately see how that could cause the PANIC.


Well... I also didn't understand if it could cause the PANIC. If I got
right, it could only cause a visibility map bit with wrong value (e.g.
first change the bit but fails to mark the tuple, in case of a failure in
between - which seems that not happened). Is that right?


 Why is the page uninitialized in the standby? If VACUUM is removing some
 dead tuples from it, it certainly should exist and be correctly initialized.


That I don't know for sure...


 How did you set up the standby? Did you initialize it from an offline
 backup of the master's data directory, perhaps? The log shows that the
 startup took the the crash recovery first, then start archive recovery
 path, because there was no backup label file. In that mode, the standby
 assumes that the system is consistent after replaying all the WAL in
 pg_xlog, which is correct if you initialize from an offline backup or
 atomic filesystem snapshot, for example. But WAL contains references to
 invalid pages could also be a symptom of an inconsistent base backup,
 cause by incorrect backup procedure. In particular, I have to ask because
 I've seen it before: you didn't delete backup_label from the backup, did
 you?


Well, I cannot answer this right now, but makes all sense and is possible.

I used a script created by someone else that does pg_start_backup + rsync +
pg_stop_backup, but in fact I didn't look at this script to see if it is
doing something nasty, as removing backup_label. I'll be able to check it
tomorrow and so I'll come back to give a definitive answer.

@andres, if it is really removing backup_label it could also cause that
other issue we saw on Monday, right? (yes I did run the same script).

BTW, is there any situation that backup_label should be removed? I see no
reason for doing this, and also have not yet saw someone doing it, so I
didn't even thought that could be it.

Thank you guys for your attention.

Best regards,
--

Re: [HACKERS] dynamic shared memory and locks

2014-01-07 Thread Robert Haas

On Tue, Jan 7, 2014 at 6:54 AM, Andres Freund and...@2ndquadrant.com wrote:
 Maybe it makes sense to have such a check #ifdef'ed out on most builds
 to avoid extra overhead, but not having any check at all just because we
 trust the review process too much doesn't strike me as the best of
 ideas.

 I don't think that check would have relevantly high performance impact
 in comparison to the rest of --enable-cassert - it's a single process
 local variable which is regularly accessed. It will just stay in
 L1 or even registers.

I tried it out on a 16-core, 64-hwthread community box, PPC.  pgbench
-S, 5-minute rules at scale factor 300 with shared_buffers=8GB:

resultsr.cassert.32.300.300:tps = 11341.627815 (including connections
establishing)
resultsr.cassert.32.300.300:tps = 11339.407976 (including connections
establishing)
resultsr.cassert.32.300.300:tps = 11321.339971 (including connections
establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11492.927372
(including connections establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11471.810937
(including connections establishing)
resultsr.cassert-spin-multiplex.32.300.300:tps = 11516.471053
(including connections establishing)

By comparison:

resultsr.master.32.300.300:tps = 197939.111998 (including connections
establishing)
resultsr.master.32.300.300:tps = 198641.337720 (including connections
establishing)
resultsr.master.32.300.300:tps = 198675.404349 (including connections
establishing)

So yeah, the overhead is negligible, if not negative.  None of the
other changes in that patch were even compiled in this case, since all
that code is protected by --disable-spinlocks.  Maybe somebody can
find another workload where the overhead is visible, but here it's
not.  On the other hand, I did discover another bit of ugliness - the
macros actually have to be defined this way:

+#define SpinLockAcquire(lock) \
+   (AssertMacro(!any_spinlock_held), any_spinlock_held = true, \
+   S_LOCK(lock))
+#define SpinLockRelease(lock) \
+   do { Assert(any_spinlock_held); any_spinlock_held = false; \
+   S_UNLOCK(lock); } while (0)

SpinLockAcquire has to be a comma-separated expression rather than a
do {} while (0) block because S_LOCK returns a value (which is used
when compiling with LWLOCK_STATS); SpinLockRelease, however, has to be
done the other way because S_UNLOCK() is defined as a do {} while (0)
block already on PPC among other architectures.

Overall I don't really care enough about this to argue strenuously for
it.  I'm not nearly so confident as Tom that no errors of this type
will creep into future code, but on the other hand, keeping
--disable-spinlocks working reliably isn't significant enough for me
to want to spend any political capital on it.  It's probably got a dim
future regardless of what we do here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane

Andrew Gierth and...@tao11.riddles.org.uk writes:
 Tom == Tom Lane t...@sss.pgh.pa.us writes:
  Tom The key reason for that was, and remains, not having the
  Tom behavior hard-wired in nodeAgg; I believe that this design
  Tom permits things to be accomplished in aggregate implementation
  Tom functions that would not have been possible with the original
  Tom patch.  I'm willing to accept some code growth to have that
  Tom flexibility.

 Do you have an actual use case?

Not a concrete example of an aggregate to build, no; but it seemed
plausible to me that a new aggregate might want more control over
the sorting operation than was possible with your patch.  Basically
the only flexibility that was there was to sort a hypothetical row before
or after its peers, right?  Now it's got essentially full control over
the sort parameters.

One idea that comes to mind is that an aggregate that's interested in the
top N rows might wish to flip the sort order, so that the rows it wants
come out of the tuplesort first rather than last --- and/or it might want
to tell the tuplesort about the row limitation, so that the bounded-sort
logic can be used.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-07 Thread Etsuro Fujita

Robert Haas wrote:
 On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp
 wrote:
  Thank you for taking time to look at this patch.  The peak memory
  usage for the discarded bitmap *can* be reflected in the number
  displayed for the bitmap heap scan by the following code in tbm_union()
 or tbm_intersect():

 Hmm, fair point.  But I'm still not convinced that we really need to add
 extra accounting for this.  What's wrong with just reporting the number
 of exact and lossy pages?

No.  I intended to show the desired memory space for a TIDBitmap rather than
the peak memory usage for that TIDBitmap.  And I thought it'd be better for
the latter to be displayed as additional information.  However, I've removed
the functionality for showing the desired memory space due to technical
problems.  Now I should probably remove the functionality for showing the
peak memory usage too.

Yes, as Andres mentioned, showing the peak memory usage is not a bad idea, I
think.  But I start to think it's not necessarily worth complicating the
code ...

If there are no objections of others, I'll remove extra accounting for
showing the peak memory usage.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITHIN GROUP patch

2014-01-07 Thread Tom Lane

I wrote:
 There's probably some overhead from traversing advance_transition_function
 for each row, which your version wouldn't have done.  15% sounds pretty
 high for that though, since advance_transition_function doesn't have much
 to do when the transition function is non strict and the state value is
 pass-by-value (which internal is).  It's possible that we could do
 something to micro-optimize that case.

The most obvious micro-optimization that's possible there is to avoid
redoing InitFunctionCallInfoData for each row.  I tried this as in the
attached patch.  It seems to be good for about half a percent overall
savings on your example case.  That's not much, but then again this
helps for *all* aggregates, and many of them are cheaper than ordered
aggregates.  I see about 2% overall savings on
select count(*) from generate_series(1,100);
which seems like a more interesting number.

As against that, the roughly-kilobyte-sized FunctionCallInfoData is
no longer just transient data on the stack, but persists for the lifespan
of the query.  We pay that already for plain FuncExpr and OpExpr nodes,
though.

On balance, I'm inclined to apply this; the performance benefit may be
marginal but it seems like it makes advance_transition_function's API
a tad cleaner by reducing the number of distinct structs it's hacking
on.  Comments?

regards, tom lane

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index 827b009..0dafb51 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*** typedef struct AggStatePerAggData
*** 235,240 
--- 235,248 
  	 */
  
  	Tuplesortstate *sortstate;	/* sort object, if DISTINCT or ORDER BY */
+ 
+ 	/*
+ 	 * This field is a pre-initialized FunctionCallInfo struct used for
+ 	 * calling this aggregate's transfn.  We save a few cycles per row by not
+ 	 * re-initializing the unchanging fields; which isn't much, but it seems
+ 	 * worth the extra space consumption.
+ 	 */
+ 	FunctionCallInfoData transfn_fcinfo;
  }	AggStatePerAggData;
  
  /*
*** static void initialize_aggregates(AggSta
*** 290,297 
  	  AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate,
! 			FunctionCallInfoData *fcinfo);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
   AggStatePerAgg peraggstate,
--- 298,304 
  	  AggStatePerGroup pergroup);
  static void advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate);
  static void advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup);
  static void process_ordered_aggregate_single(AggState *aggstate,
   AggStatePerAgg peraggstate,
*** initialize_aggregates(AggState *aggstate
*** 399,419 
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in fcinfo, so that we needn't copy them again to pass to the
!  * transition function.  No other fields of fcinfo are assumed valid.
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate,
! 			FunctionCallInfoData *fcinfo)
  {
! 	int			numTransInputs = peraggstate-numTransInputs;
  	MemoryContext oldContext;
  	Datum		newVal;
- 	int			i;
  
  	if (peraggstate-transfn.fn_strict)
  	{
--- 406,425 
   * Given new input value(s), advance the transition function of an aggregate.
   *
   * The new values (and null flags) have been preloaded into argument positions
!  * 1 and up in peraggstate-transfn_fcinfo, so that we needn't copy them again
!  * to pass to the transition function.  We also expect that the static fields
!  * of the fcinfo are already initialized; that was done by ExecInitAgg().
   *
   * It doesn't matter which memory context this is called in.
   */
  static void
  advance_transition_function(AggState *aggstate,
  			AggStatePerAgg peraggstate,
! 			AggStatePerGroup pergroupstate)
  {
! 	FunctionCallInfoData *fcinfo = peraggstate-transfn_fcinfo;
  	MemoryContext oldContext;
  	Datum		newVal;
  
  	if (peraggstate-transfn.fn_strict)
  	{
*** advance_transition_function(AggState *ag
*** 421,426 
--- 427,435 
  		 * For a strict transfn, nothing happens when there's a NULL input; we
  		 * just keep the prior transValue.
  		 */
+ 		int			numTransInputs = peraggstate-numTransInputs;
+ 		int			i;
+ 
  		for (i = 1; i = numTransInputs; i++)
  		{
  			if (fcinfo-argnull[i])
*** advance_transition_function(AggState *ag
*** 467,478

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-07 Thread Amit Kapila

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:
 The point remains that you need to duplicate it into every process that
 might
 want to use it subsequently, so it makes sense to DuplicateHandle into the
 parent, and then to advertise that  handle value publicly so that other
 child
 processes can DuplicateHandle it back into their own process.

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cleanup in code

2014-01-07 Thread Amit Kapila

On Wed, Jan 8, 2014 at 1:25 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/07/2014 05:20 PM, Tom Lane wrote:

 David Rowley dgrowle...@gmail.com writes:

 I think it will be like Andres said up thread, to stop multiple
 evaluations
 of the expression passed to the macro.


 Exactly.  We are not going to risk multiple evals in a macro as commonly
 used as elog/ereport; the risk/benefit ratio is just too high.

 I don't see anything wrong with suppressing this warning by inserting
 an additional return statement.  The code is already plastered with such
 things, from the days before we had any unreachability hints in
 elog/ereport.  And as I said upthread, there is no good reason to suppose
 that the unreachability hints are always recognized by every compiler.
 I take this behavior of MSVC as proof of that statement.


 Yeah, I was just surprised because I thought MSVC understood it. Committed
 the additional return statement.

Thanks for committing both the patches for cleanup.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS]

2014-01-07 Thread Dilip kumar

Below attached patch is implementing following todo item..
machine-readable pg_controldata?
http://www.postgresql.org/message-id/4b901d73.8030...@agliodbs.com
Possible approaches:

1.   Implement as backend function and provide a view to user.

-  But In this approach user can only get this information when server 
is running.


2.   Extend pg_controldata tool to provide value of an individual attribute.


A first draft version of the patch is attached in the mail, implemented using 
approach 2.

Description of the patch:

1.   Below is the list of the option which are exposed by pg_controldata 
(all checkpoint, redo and backup related options are exposed).
I think we don't need to expose options for other parameters, because they will 
not be useful for using independently. (user can simply print all values same 
as old  behavior)

  -v, --catalog-version 
catalog version
  -l, --latest-chcekpoint-loc   Latest 
checkpoint location
  -L, --prev-chcekpoint-locPrior 
checkpoint location
  -r, --latest-checkpoint-redoloc Latest 
checkpoint's REDO location
  -t, --latest-checkpoint-timeline   Latest 
checkpoint's TimeLineID
  -T, --latest-checkpoint-prevtimeline Latest checkpoint's 
PrevTimeLineID
  -c, --latest-checkpoint-time  Time of 
latest checkpoint
  -x, --latest-checkpoint-nextxidLatest 
checkpoint's NextXID
  -o, --latest-checkpoint-nextoid   Latest Latest 
checkpoint's NextOID
  -X, --latest-checkpoint-nextmulti-xactid  Latest checkpoint's 
NextMultiXactId
  -O, --latest-checkpoint-nextmulti-offset Latest checkpoint's 
NextMultiOffset
  -q, --latest-checkpoint-oldestxidLatest 
checkpoint's oldestXID
  -a, --latest-checkpoint-oldest-activexid   Latest checkpoint's 
oldestActiveXID
  -m, --latest-checkpoint-oldest-multixid   Latest checkpoint's 
oldestMultiXid
  -e, --min-recovery-endloc Minimum 
recovery ending location
  -E, --min-recovery-endloc-timeline   Min recovery ending 
loc's timeline
  -b, --backup-start-location Backup 
start location
  -B, --backup-end-location  Backup end 
location


2.   If user provide individual option, then direct value of that parameter 
will be printed (as shown in below example), parameter name is not printed as 
output so that user need not to parse the output.

./pg_controldata --latest-checkpoint-time-- run with latest-checkpoint-time 
option
Tue 07 Jan 2014 05:22:42 PM IST  --output is direct value


3.   This feature can save parsing effort for user where user need to get 
the value of individual parameter

i.e Time of latest checkpoint for Standby lag monitoring.


Please provide your suggestion...

Regards,
Dilip




controldata_v1.patch
Description: controldata_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TODO: machine-readable pg_controldata

2014-01-07 Thread Dilip kumar

Below attached patch is implementing following todo item..
machine-readable pg_controldata?
http://www.postgresql.org/message-id/4b901d73.8030...@agliodbs.com
Possible approaches:

1.   Implement as backend function and provide a view to user.

-  But In this approach user can only get this information when server 
is running.


2.   Extend pg_controldata tool to provide value of an individual attribute.


A first draft version of the patch is attached in the mail, implemented using 
approach 2.

Description of the patch:

1.   Below is the list of the option which are exposed by pg_controldata 
(all checkpoint, redo and backup related options are exposed).
I think we don't need to expose options for other parameters, because they will 
not be useful for using independently. (user can simply print all values same 
as old  behavior)

  -v, --catalog-version 
catalog version
  -l, --latest-chcekpoint-loc   Latest 
checkpoint location
  -L, --prev-chcekpoint-locPrior 
checkpoint location
  -r, --latest-checkpoint-redoloc Latest 
checkpoint's REDO location
  -t, --latest-checkpoint-timeline   Latest 
checkpoint's TimeLineID
  -T, --latest-checkpoint-prevtimeline Latest checkpoint's 
PrevTimeLineID
  -c, --latest-checkpoint-time  Time of 
latest checkpoint
  -x, --latest-checkpoint-nextxidLatest 
checkpoint's NextXID
  -o, --latest-checkpoint-nextoid   Latest Latest 
checkpoint's NextOID
  -X, --latest-checkpoint-nextmulti-xactid  Latest checkpoint's 
NextMultiXactId
  -O, --latest-checkpoint-nextmulti-offset Latest checkpoint's 
NextMultiOffset
  -q, --latest-checkpoint-oldestxidLatest 
checkpoint's oldestXID
  -a, --latest-checkpoint-oldest-activexid   Latest checkpoint's 
oldestActiveXID
  -m, --latest-checkpoint-oldest-multixid   Latest checkpoint's 
oldestMultiXid
  -e, --min-recovery-endloc Minimum 
recovery ending location
  -E, --min-recovery-endloc-timeline   Min recovery ending 
loc's timeline
  -b, --backup-start-location Backup 
start location
  -B, --backup-end-location  Backup end 
location


2.   If user provide individual option, then direct value of that parameter 
will be printed (as shown in below example), parameter name is not printed as 
output so that user need not to parse the output.

./pg_controldata --latest-checkpoint-time-- run with latest-checkpoint-time 
option
Tue 07 Jan 2014 05:22:42 PM IST  --output is direct value


3.   This feature can save parsing effort for user where user need to get 
the value of individual parameter

i.e Time of latest checkpoint for Standby lag monitoring.


Please provide your suggestion...

Regards,
Dilip



controldata_v1.patch
Description: controldata_v1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-01-07 Thread Kyotaro HORIGUCHI

Hello,

tgl I'm pretty sure that IsA test prevents the optimization from ever firing.

Thank you.

tgl But aside from hasty typos,

Oops! I've picked up wrong node. It always denies pathkeys extension.

| !IsA(member, Var))

is a mistake of the following.

| !IsA(member-em_expr, Var))

tgl is there enough left of this optimization to
tgl be worth the complication?

Certainly.

This patch is intended to be with my another UNION-ALL
optimization pathce at first. It does very much with
UNION-ORDERBY-LIMIT and also with UNION_ALL(Partitioned
tables)-DISTINCT-ORDERBY-LIMIT.


= test tables
create table pu1 (a int not null, b int not null, c int, d text);
create unique index i_pu1_ab on pu1 (a, b);
create table cu11 (like pu1 including all) inherits (pu1);
create table cu12 (like pu1 including all) inherits (pu1);
create table pu2 (like pu1 including all);
create table cu21 (like pu2 including all) inherits (pu2);
create table cu22 (like pu2 including all) inherits (pu2);
insert into cu11 (select a / 5, 4 - (a % 5), a, 'cu11' from 
generate_series(00, 09) a);
insert into cu12 (select a / 5, 4 - (a % 5), a, 'cu12' from 
generate_series(10, 19) a);
insert into cu21 (select a / 5, 4 - (a % 5), a, 'cu21' from 
generate_series(20, 29) a);
insert into cu22 (select a / 5, 4 - (a % 5), a, 'cu22' from 
generate_series(30, 39) a);


Following example is an implicit union derived from partitioned
tables created as above. Limit is added to highlight the effect.

9.4dev with no patch, 

| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b 
limit 100;
| QUERY PLAN  
| 
|  Limit (actual time=246.653..246.730 rows=100 loops=1)
|  -  Unique (actual time=246.646..246.713 rows=100 loops=1)
|   -  Sort (actual time=246.645..246.666 rows=100 loops=1)
|Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
|Sort Method: external sort  Disk: 5280kB
|-  Append (actual time=0.017..52.608 rows=20 loops=1)
| -  Seq Scan on pu1 (actual time=0.001..0.001 rows=0 loops=1)
| -  Seq Scan on cu11 (actual time=0.015..17.047 rows=10 loops=1)
| -  Seq Scan on cu12 (actual time=0.007..15.474 rows=10 loops=1)
|  Total runtime: 247.854 ms

Fairly common query. It will get following plan with this patch.


| =# explain (analyze on, costs off) select distinct * from pu1 order by a, b 
limit 100;
|   QUERY PLAN
| -
|  Limit (actual time=0.108..0.350 rows=100 loops=1)
|  -  Unique (actual time=0.104..0.329 rows=100 loops=1)
|   -  Merge Append (actual time=0.103..0.214 rows=100 loops=1)
| Sort Key: pu1.a, pu1.b, pu1.c, pu1.d
| -  Index Scan .. on pu1 (actual time=0.003..0.003 rows=0 loops=1)
| -  Index Scan .. on cu11 (actual time=0.049..0.110 rows=100 loops=1)
| -  Index Scan .. on cu12 (actual time=0.044..0.044 rows=1 loops=1)
|  Total runtime: 0.666 ms


The same could be said on union with my another union-pullup
patch.  We will get the following result with only the
union-pullup patch.

|=# explain (analyze on, costs off) select * from cu11 union select * from cu12 
union select * from cu21 union select * from cu22 order by a limit 1;
|   QUERY PLAN
|---
| Limit (actual time=474.825..482.326 rows=1 loops=1)
| -  Unique (actual time=474.824..481.357 rows=1 loops=1)
|  -  Sort (actual time=474.823..477.101 rows=1 loops=1)
|   Sort Key: cu11.a, cu11.b, cu11.c, cu11.d
|   Sort Method: external sort  Disk: 10568kB
|   -  Append (actual time=0.018..99.310 rows=40 loops=1)
|-  Seq Scan on cu11 (actual time=0.017..16.263 rows=10 loops=1)
|-  Seq Scan on cu12 (actual time=0.006..14.831 rows=10 loops=1)
|-  Seq Scan on cu21 (actual time=0.006..14.766 rows=10 loops=1)
|-  Seq Scan on cu22 (actual time=0.006..14.721 rows=10 loops=1)
| Total runtime: 484.555 ms

This is also a quite common operation but implicit DISTINCT
prevents planner from selecting index scans. (ORDER BY is not
essential but needed because planner does not consult
distinct_pathkeys on planning scans and I've left it as it is.)

The planner gives the following plan with this patch.

| =# explain (analyze on, costs off) select * from cu11 union select * from 
cu12 union select * from cu21 union select * from cu22 order by a limit 1;
|QUERY PLAN   
| -
|  Limit (actual time=0.197..14.739 rows=1 loops=1)
|  -  Unique (actual time=0.191..13.527 rows=1 loops=1)
|   -  Merge Append (actual time=0.191..7.894 rows=1 loops=1)
|

Re: [HACKERS] Standalone synchronous master

2014-01-07 Thread Amit Kapila

On Wed, Nov 13, 2013 at 6:39 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:

 Add a new eager synchronous mode that starts out synchronous but reverts
 to asynchronous after a failure timeout period

 This would require some type of command to be executed to alert
 administrators of this change.

 http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
 This patch implementation is in the same line as it was given in the earlier
 thread.

 Some Of the additional important changes are:

 1.   Have added two GUC variable to take commands from user to be
 executed

 a.   Master_to_standalone_cmd: To be executed before master switches to
 standalone mode.

 b.  Master_to_sync_cmd: To be executed before master switches from sync
 mode to standalone mode.

   In description of both switches (a  b), you are telling that it
will switch to
   standalone mode, I think by your point 1b. you mean to say other way
   (switch from standalone to sync mode).

   Instead of getting commands, why can't we just log such actions? I think
   adding 3 new guc variables for this functionality seems to be bit high.

   Also what will happen when it switches to standalone mode incase there
   are some async standby's already connected to it before going to
   standalone mode, if it continues to send data then I think naming it as
   'enable_standalone_master' is not good.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in visibility map WAL-logging

2014-01-07 Thread Greg Stark

On Tue, Jan 7, 2014 at 11:36 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. The xlogdump indeed shows that the order of 'clean' and 'visible' is
 incorrect, but I don't immediately see how that could cause the PANIC. Why
 is the page uninitialized in the standby? If VACUUM is removing some dead
 tuples from it, it certainly should exist and be correctly initialized.


Unless the vacuum subsequently truncated the file to be shorter and
the backup was taken after that?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-07 Thread Craig Ringer

 I've been thinking about this some more and I don't think you can
 avoid the need to remap vars and attrs.

I agree. I was hoping to let expand_targetlist / preprocess_targetlist
do the job, but I'm no longer convinced that'll do the job without
mangling them in the process.

 AIUI, your modified version of rewriteTargetView() will turn an UPDATE
 query with a rangetable of the form

The patch I posted is clearly incorrect in several ways.  Unfortunately
I'm still coming up to speed with the rewriter / planner / executor at
the same time as trying to make major modifications to them. This tends
to produce some false starts.

(It no longer attempts to use expand_targetlist in the rewriter, btw,
that's long gone).

Since the posted patch I've sorted out RETURNING list rewriting and a
few other issues. There are some major issues remaining with the
approach though:

- If we have nested views, we need to pass the tuple ctid up the chain
  of expanded view subqueries so ExecModifyTable can consume it. This
  is proving to be a lot harder than I'd hoped.

- expand_targetlist / preprocess_targetlist operate on the
  resultRelation. With updatable s.b. views, the resultRelation is no
  longer the relation from which tuples are being read for input into
  ExecModifyTable.

- In subqueries, resultRelation is only set for the outermost layer.
  So preprocess_targetlist doesn't usefully add tlist entries for the
  inner subquery layers at all.

- It is necessary to expand DEFAULTs into expression tlist entries in
  the innermost query layer so that Vars added further up in the
  subquery chain can refer to them. In the current experimental patch
  DEFAULTs aren't populated correctly.

So we have the following needs:

- passing ctids up through layers of subqueries

- target-list expansion through layers of subqueries

- Differentiating between resultRelation to heapmodifytuple and the
source of the tuples to feed into heapmodifytuple now that these are no
longer the same thing.


I was originally hoping all this could be dealt with in the rewriter, by
changing resultRelation to point to the next-inner-most RTE whenever a
target view is expanded. This turns out to be too simplistic:

- There is no ctid col on a view, so if we have v2 over v1 over table t,
when we expand v2 there's no way to create a tlist entry to point to
v1's ctid. It won't have one until we've expanded v1 into t in the next
pass. The same issue applies to expanding the tlist to carry cols of t
up the subquery chain in the rewrite phase.

- Rowmarks are added to point to resultrelation after rewrite, and these
then add tlist entries in preprocess_targetlist. These TLEs will point
to the base resultRelation, which isn't correct when we're updating
nested subqueries.

So we just can't do this during recursive rewrite, because the required
information is only available once the target view is fully expanded
into nested subqueries.

It seems that tlist fixup and injection of the ctid up the subquery
chain must be done after all recursive rewriting.

To do these fixups later, we need to keep track of which nested series
of subqueries is the target, i.e. produces tuples including resjunk
ctid cols to feed into ExecModifyTuple. Currently this information is
resultRelation



 The more I think about this, the more I think that the approach of my
 original patch was neater. The idea was to have 2 new pieces of code:
 
 1). In rewriteTargetView() decorate the target RTE with any security
 barrier quals (in the new rte-securityQuals field), instead of
 merging them with the main query's quals. So the output of this
 stage of rewriting would be something like
 
   rtable:
 1: relation RTE (base table) - resultRelation
 - securityQuals = [view quals]
   fromList: [1]

So you're proposing to still flatten views during rewrite, as the
current code does, but just keep track of s.b. quals separately?

If so, what about multiple S.B. views may be nested and their quals must
be isolated from each other, not just from the outer query?

That's why I see subqueries as such a useful model.

 2). After all rewriting is complete, scan the query and turn all
 relation RTEs with non-empty securityQuals into subquery RTEs
 (making a copy of the original RTE in the case of the result
 relation).

I'm not sure I understand how this would work in the face of multiple
levels of nesting s.b. views. Are you thinking of doing recursive expansion?

 Another ugly
 feature of my earlier patch was the changes it made to
 expand_target_list() and the need to track the query's sourceRelation.

I've been fighting the need to add the concept of a sourceRelation for
this purpose too.

 Both of those things can be avoided simply by moving the subquery
 expansion code (2) to after expand_target_list(), and hence also after
 expand_inherited_tables().

That's certainly interesting. I'm reading over the patch now.

 There is still a lot more testing to be done

Re: [HACKERS] WIP patch (v2) for updatable security barrier views

2014-01-07 Thread Craig Ringer

On 01/08/2014 02:00 PM, Craig Ringer wrote:

 I'm not sure I understand how this would work in the face of multiple
 levels of nesting s.b. views. Are you thinking of doing recursive expansion?

Never mind, that part is clearly covered in the patch.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Simple improvements to freespace allocation

2014-01-07 Thread Simon Riggs

Current freesapce code gives a new block insert target (NBT) from
anywhere in table. That isn't very useful with bigger tables and it
would be useful to be able to specify different algorithms for
producing NBTs.

ALTER TABLE foo WITH (freespace = );

Three simple and useful models come to mind

* CONCURRENT
This is the standard/current model. Naming it likes this emphasises
why we pick NBTs in the way we do.

* PACK
We want the table to be smaller, so rather than run a VACUUM FULL we
want to force the table to choose an NBT at start of table, even at
the expense of concurrency. By avoiding putting new data at the top of
the table we allow the possibility that VACUUM will shrink table size.
This is same as current except we always reset the FSM pointer to zero
and re-seek from there. This takes some time to have an effect, but is
much less invasive than VACUUM FULL.

* RECENT
For large tables that are append-mostly use case it would be easier to
prefer NBTs from the last two 1GB segments of a table, allowing them
to be more easily cached. This is same as current except when we wrap
we don't go to block 0 we go to first block of penultimate (max - 1)
segment. For tables = 2 segments this is no change from existing
algorithm. For larger tables it would focus updates/inserts into a
much reduced and yet still large area and allow better cacheing.

These are small patches.

...other possibilities, though more complex are...

* IN-MEMORY
A large table may only have some of its blocks in memory. It would be
useful to force a NBT to be a block already in shared_buffers IFF a
table is above a certain size (use same threshold as seq scans, i.e.
25% of shared_buffers). That may be difficult to achieve in practice,
so not sure about this one. Like it? Any ideas?

We might also allow a custom NBT policy though allowing user code at
that point could be dangerous.

Thoughts?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standalone synchronous master

2014-01-07 Thread Rajeev rastogi

On 8th Jan, 2014, Amit Kapila Wrote
 
  Add a new eager synchronous mode that starts out synchronous but
  reverts to asynchronous after a failure timeout period
 
  This would require some type of command to be executed to alert
  administrators of this change.
 
  http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
  This patch implementation is in the same line as it was given in the
  earlier thread.
 
  Some Of the additional important changes are:
 
  1.   Have added two GUC variable to take commands from user to be
  executed
 
  a.   Master_to_standalone_cmd: To be executed before master
 switches to
  standalone mode.
 
  b.  Master_to_sync_cmd: To be executed before master switches
 from sync
  mode to standalone mode.
 
In description of both switches (a  b), you are telling that it
 will switch to
standalone mode, I think by your point 1b. you mean to say other way
(switch from standalone to sync mode).

Yes you are right. Its typo mistake.

Instead of getting commands, why can't we just log such actions? I
 think
adding 3 new guc variables for this functionality seems to be bit
 high.

Actually in earlier discussion as well as in TODO added, it is mentioned
to execute some kind of command to be executed to alert administrator.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

In my current patch, I have kept the LOG along with command. 

Also what will happen when it switches to standalone mode incase
 there
are some async standby's already connected to it before going to
standalone mode, if it continues to send data then I think naming it
 as
'enable_standalone_master' is not good.

Yes we can change name to something more appropriate, some of them are:
1. enable_async_master
2. sync_standalone_master
3. enable_nowait_master
4. enable_nowait_resp_master

Please provide your suggestion on above name or any other?.

Thanks and Regards,
Kumar Rajeev Rastogi
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Simple improvements to freespace allocation

2014-01-07 Thread Heikki Linnakangas


On 01/08/2014 08:56 AM, Simon Riggs wrote:

Current freesapce code gives a new block insert target (NBT) from
anywhere in table. That isn't very useful with bigger tables and it
would be useful to be able to specify different algorithms for
producing NBTs.


I've actually been surprised how little demand there has been for 
alternative algorithms. When I wrote the current FSM implementation, I 
expected people to start coming up with all kinds of wishes, but it 
didn't happen. There has been very few complaints, everyone seems to be 
satisfied with the way it works now. So I'm not convinced there's much 
need for this.



ALTER TABLE foo WITH (freespace = );

Three simple and useful models come to mind

* CONCURRENT
This is the standard/current model. Naming it likes this emphasises
why we pick NBTs in the way we do.

* PACK
We want the table to be smaller, so rather than run a VACUUM FULL we
want to force the table to choose an NBT at start of table, even at
the expense of concurrency. By avoiding putting new data at the top of
the table we allow the possibility that VACUUM will shrink table size.
This is same as current except we always reset the FSM pointer to zero
and re-seek from there. This takes some time to have an effect, but is
much less invasive than VACUUM FULL.


We already reset the FSM pointer to zero on vacuum. Would the above 
actually make any difference in practice?



* RECENT
For large tables that are append-mostly use case it would be easier to
prefer NBTs from the last two 1GB segments of a table, allowing them
to be more easily cached. This is same as current except when we wrap
we don't go to block 0 we go to first block of penultimate (max - 1)
segment. For tables = 2 segments this is no change from existing
algorithm. For larger tables it would focus updates/inserts into a
much reduced and yet still large area and allow better cacheing.


Umm, wouldn't that bloat the table with no limit? Putting my 
DBA/developer hat on, I don't understand when I would want to use that 
setting.



These are small patches.

...other possibilities, though more complex are...

* IN-MEMORY
A large table may only have some of its blocks in memory. It would be
useful to force a NBT to be a block already in shared_buffers IFF a
table is above a certain size (use same threshold as seq scans, i.e.
25% of shared_buffers). That may be difficult to achieve in practice,
so not sure about this one. Like it? Any ideas?


Yeah, that seems nice, although I have feeling that it's not worth the 
complexity.


There's one policy that I'd like to see: maintaining cluster order. When 
inserting a new tuple, try to place it close to other tuples with 
similar keys, to keep the table clustered.


In practice, CLUSTER CONCURRENTLY might be more useful, though.


We might also allow a custom NBT policy though allowing user code at
that point could be dangerous.


Yeah, I don't think there's much need for that. Overall,

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

69 matches

Mail list logo