date:20140612

On 2014-06-12 00:38:36 -0400, Noah Misch wrote:
On Tue, Apr 15, 2014 at 03:28:41PM -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2014-03-27 08:02:35 -0400, Tom Lane wrote:
Buildfarm member prairiedog thinks there's something unreliable about
commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

So I had made a notice to recheck on
this.
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedogbr=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Hard to say. In any case, I agree we can't make any progress unless we
see it again.

The improved test just tripped:

Hrmpf. Just one of these days I was happy thinking it was gone...

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07

Hm. My guess it's that it's just a 'harmless' concurrency issue. The
test currently run in concurrency with others: I think what happens is
that the table gets dropped in the other relation after the query has
acquired the mvcc snapshot (used for the pg_class) test.
But why is it triggering on such a 'unusual' system and not on others?
That's what worries me a bit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure

Andres Freund and...@2ndquadrant.com writes:
 On 2014-06-12 00:38:36 -0400, Noah Misch wrote:
 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07

 Hm. My guess it's that it's just a 'harmless' concurrency issue. The
 test currently run in concurrency with others: I think what happens is
 that the table gets dropped in the other relation after the query has
 acquired the mvcc snapshot (used for the pg_class) test.
 But why is it triggering on such a 'unusual' system and not on others?
 That's what worries me a bit.

prairiedog is pretty damn slow by modern standards.  OTOH, I think it
is not the slowest machine in the buildfarm; hamster for instance seems
to be at least a factor of 2 slower.  So I'm not sure whether to believe
it's just a timing issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Few observations in replication slots related code

Hi Amit,

On 2014-06-12 08:55:59 +0530, Amit Kapila wrote:
 1. In function StartupReplicationSlots(XLogRecPtr checkPointRedo),
 parameter checkPointRedo is not used.

It used to be included in a debug message. Apparently the message was
removed at some point (don't remember it, but I have a memory like a
sieve).

 2. Few check are in different order in functions
 pg_create_physical_replication_slot() and
 pg_create_logical_replication_slot().
 
 if (get_call_result_type(fcinfo, NULL, tupdesc) != TYPEFUNC_COMPOSITE)
 elog(ERROR, return type must be a row type);
 check_permissions();
 
 CheckLogicalDecodingRequirements();
 
 I don't think it makes any difference, however having checks in similar
 order seems better unless there is a reason for not doing so.

Can change it.

 3. Currently there is any Assert (Assert(!MyReplicationSlot);) in
 pg_create_logical_replication_slot(), is there a need for similar
 Assert in pg_create_physical_replication_slot()?

Hm. Shouldn't really matter either way these days, but I guess it
doesn't hurt to add one.

 4.
 StartupDecodingContext()
 {
 ..
 context = AllocSetContextCreate(CurrentMemoryContext,
 Changeset Extraction Context,
 }
 
 To be consistent, shall we name this context as
 logical | logical decoding?

Heh. That apparently escaped when things were renamed. Yes.

 5.
 pg_create_logical_replication_slot()
 {
 ..
 CreateInitDecodingContext()
 
 ..
 ReplicationSlotPersist()
 }
 
 Function pg_create_logical_replication_slot() is trying to
 save slot twice once during CreateInitDecodingContext() and
 then in ReplicationSlotPersist(), isn't it better if we can make
 it do it just once?

Doesn't work here. In the first save it's not yet marked as persistent -
but we still need to safely reserve the xmin. It's also not something
that should happen very frequently, so I'm not worried about the
overhead.

 6.
 elog(ERROR, cannot handle changeset extraction yet);
 
 Shouldn't it be better to use logical replication instead
 of changeset extraction?

Will change.

 7.
 + * XXX: It might make sense to introduce ephemeral slots and always use
 + * the slot mechanism.
 
 Already there are RS_EPHEMERAL slots, might be this
 comment needs bit of rephrasing.

 8. Is there a chance of inconsistency, if pg_restxlog resets the
 xlog and after restart, one of the slots contains xlog position
 older than what is resetted by pg_resetxlog?

Yes. There's lots of ways to screw over your database by using
pg_resetxlog. I personally think there should be a required
--yes-i-am-breaking-my-database parameter for pg_resetxlog.

 9.
 void
 LogicalIncreaseXminForSlot(XLogRecPtr current_lsn, TransactionId xmin)
 {
 ..
 /*
  * don't overwrite if we already have a newer xmin. This can happen if we
  * restart decoding in a slot.
  */
 if (TransactionIdPrecedesOrEquals(xmin, slot-data.catalog_xmin))
 {
 }
 ..
 }
 
 Should we just release spinlock in this loop and just return,
 rather than keeping no action loop?

Don't think that'd make it any faster/simpler. We'd be stuck with
duplicate codepaths.

 10.
 * Iff ri_type = REPLICA_IDENTITY_INDEX, indexOid must be the Oid of a
 suitable
  * index. Otherwise, it should be InvalidOid.
  */
 static void
 relation_mark_replica_identity(Relation rel, char ri_type, Oid indexOid,
bool is_internal)
 
 typo - *Iff*

iff := if and only if.

Thanks for the look.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared memory changes in 9.4?

2014-06-12 Thread Christoph Berg

[redirecting to -hackers]

Re: Robert Haas 2014-05-28 
CA+TgmoaTcAd48zW3auWzGdHi_V+QwA5HVCTabqgTq=zlwpy...@mail.gmail.com
 On Tue, May 27, 2014 at 8:22 PM, Maciek Sakrejda m.sakre...@gmail.com wrote:
  On Mon, May 26, 2014 at 12:24 AM, Andres Freund and...@2ndquadrant.com
  wrote:
  Any chance you're using a 9.3 configuration file instead of the one
  generated by initdb?
  dynamic_shared_memory_type defaults to 'posix' if not specified in the
  config file (on platforms supporting it). If initdb detects that 'posix'
  can't be used it'll emit a different value. If you're copying the config
  from 9.3 and your environment doesn't support posix shm that'll cause
  the above error.
  I still think dynamic_shared_memory_type should default to 'none'
  because of such problems
 
  It works with 'none' and 'sysv'--I think the issue is that technically our
  environment does support 'posix', but '/dev/shm' is indeed not mounted in
  the LXC container, leading to a discrepancy between what initdb decides and
  what's actually possible. Thanks for your help.
 
 I think it would be good to understand why initdb isn't getting this
 right.  Did you run initdb outside the LXC container, where /dev/shm
 would have worked, but then run postgres inside the LXC container,
 where /dev/shm does not work?  I ask because initdb is supposed to be
 doing the same thing that postgres does, so it really ought to come to
 the same conclusion about what will and won't work.
 
 With regard to Andres' proposal, I'm not that keen on setting
 dynamic_shared_memory_type='none' by default.  Would we leave it that
 way until we get in-core users of the facility, and then change it?  I
 guess that'd be OK, but frankly if enabling dynamic_shared_memory_type
 by default is causing us many problems, then we'd better reconsider
 the design of the facility now, before we start adding more
 dependencies on it.  We've already fixed a bunch of DSM-related issues
 as a result of the fact that the default *isn't* none, and I dunno how
 many of those we would have found if the default had been none.  I
 tend to think DSM is an important facility that we're going to be
 wanting to build on in future releases, so I'm keen to have it
 available by default so that we can iron out any kinks before we get
 too far down that path.

Hi,

I've just run into the same problem. I am running the Debian
postgresql-common testsuite, which includes upgrade tests. On
upgrades, the old postgresql.conf is copied to the new server (after
initdb and/or pg_upgrade), and deprecated options are removed/renamed. [*]

In that chroot environment, 9.4 is running fine, except that upgrades
failed because /dev/shm wasn't mounted, and the old 9.3
postgresql.conf doesn't contain dynamic_shared_memory_type = 'sysv'.

To me, the following should be done:
* Make initdb determine the best shm type for this platform and write
  it into postgresql.conf as it does now.
* If no dynamic_shared_memory_type is found in the config, default to
  none.
* Modify the three identical error messages concerned about shm
  segments to include the shm type instead of always just saying
  FATAL:  could not open shared memory segment
* Add a HINT to the POSIX error message:
  HINT: This might indicate that /dev/shm is not mounted, or its
  permissions do not allow the database user to create files there

Despite /dev/shm having been around for quite some time, many people
seem to be unaware what POSIX shared memory is, so the HINT is really
needed there. It certainly took me weeks after seeing the error
message the first time till I found the time to dig deeper to the
issure - it should just have been oh yes, /dev/shm isn't mounted
there, that's why.

Christoph

[*] This might not be the smartest thing to do, but it's a simple
default approach to get the new cluster running with as much user
config from the old config as possible.
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared memory changes in 9.4?

Hi,

On 2014-06-12 11:07:31 +0200, Christoph Berg wrote:
 Re: Robert Haas 2014-05-28 
 CA+TgmoaTcAd48zW3auWzGdHi_V+QwA5HVCTabqgTq=zlwpy...@mail.gmail.com
  On Tue, May 27, 2014 at 8:22 PM, Maciek Sakrejda m.sakre...@gmail.com 
  wrote:
   On Mon, May 26, 2014 at 12:24 AM, Andres Freund and...@2ndquadrant.com
   wrote:
   Any chance you're using a 9.3 configuration file instead of the one
   generated by initdb?
   dynamic_shared_memory_type defaults to 'posix' if not specified in the
   config file (on platforms supporting it). If initdb detects that 'posix'
   can't be used it'll emit a different value. If you're copying the config
   from 9.3 and your environment doesn't support posix shm that'll cause
   the above error.
   I still think dynamic_shared_memory_type should default to 'none'
   because of such problems

  With regard to Andres' proposal, I'm not that keen on setting
  dynamic_shared_memory_type='none' by default.

Note that I'm not proposing to disable the whole thing. Just that a
unset dynamic_shared_memory_type doesn't configure dsm. Initdb would
still configure it after probing.

 To me, the following should be done:
 * Make initdb determine the best shm type for this platform and write
   it into postgresql.conf as it does now.
 * If no dynamic_shared_memory_type is found in the config, default to
   none.
 * Modify the three identical error messages concerned about shm
   segments to include the shm type instead of always just saying
   FATAL:  could not open shared memory segment
 * Add a HINT to the POSIX error message:
   HINT: This might indicate that /dev/shm is not mounted, or its
   permissions do not allow the database user to create files there

Sounds like a sane plan to me.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RETURNING PRIMARY KEY syntax extension

2014-06-12 Thread Jochem van Dieten

On Wed, Jun 11, 2014 at 2:39 AM, Tom Lane wrote:

 I'm not even 100% sold that automatically returning the primary key
 is going to save any application logic.  Could somebody point out
 *exactly* where an app is going to save effort with this type of
 syntax, compared to requesting the columns it wants by name?


I haven't checked the code, but I am hoping it will help with the problem
where a RETURNING * is added to a statement that is not an insert or update
by the JDBC driver. That has been reported on the JDBC list at least twice,
and the proposed workaround is neither very elegant nor very robust:
https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ


Jochem


-- 
Jochem van Dieten
http://jochem.vandieten.net/

Re: [HACKERS] RETURNING PRIMARY KEY syntax extension

2014-06-12 Thread Ian Barwick

On 14/06/12 18:46, Jochem van Dieten wrote:
 On Wed, Jun 11, 2014 at 2:39 AM, Tom Lane wrote:
 
 I'm not even 100% sold that automatically returning the primary key
 is going to save any application logic.  Could somebody point out
 *exactly* where an app is going to save effort with this type of
 syntax, compared to requesting the columns it wants by name?
 
 
 I haven't checked the code, but I am hoping it will help with the problem 
 where a RETURNING * is added to a statement that is not an insert or update
 by the JDBC driver. That has been reported on the JDBC list at least twice,
 and the proposed workaround is neither very elegant nor very robust:
 https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ

Unfortunately that seems to be a JDBC-specific issue, which is outside
of the scope of this particular patch (which proposes additional server-side
syntax intended to make RETURNING * operations more efficient for
certain use cases, but which is in itself not a JDBC change).


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Few observations in replication slots related code

Hi,

On 2014-06-12 09:15:08 +0200, Andres Freund wrote:
  6.
  elog(ERROR, cannot handle changeset extraction yet);
  
  Shouldn't it be better to use logical replication instead
  of changeset extraction?
 
 Will change.

I don't see that message anywhere in current code? All of those were
rephrased in b89e151054a05f0f6d356ca52e3b725dd0505e53 (where Robert
changed the term to logical decoding) and then implemented in
5a991ef8692ed0d170b44958a81a6bd70e90585c.

Pushed a fix for the rest of the ones I commented on earlier. Thanks!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] replication commands and log_statements

2014-06-12 Thread Fujii Masao

On Wed, Jun 11, 2014 at 11:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 Your wish just seems like a separate feature to me. Including
 replication commands in 'all' seems correct independent of the desire
 for a more granular control.

 No, I think I've got to vote with the other side on that.

 The reason we can have log_statement as a scalar progression
 none  ddl  mod  all is that there's little visible use-case
 for logging DML but not DDL, nor for logging SELECTS but not
 INSERT/UPDATE/DELETE.  However, logging replication commands seems
 like something people would reasonably want an orthogonal control for.
 There's no nice way to squeeze such a behavior into log_statement.

 I guess you could say that log_statement treats replication commands
 as if they were DDL, but is that really going to satisfy users?

 I think we should consider log_statement to control logging of
 SQL only, and invent a separate GUC (or, in the future, likely
 more than one GUC?) for logging of replication activity.

Seems reasonable. OK. The attached patch adds log_replication_command
parameter which causes replication commands to be logged. I added this to
next CF.

Regards,

-- 
Fujii Masao
From 9874e36a8f65667d1f4d3a9a8d1d87d2d20f5188 Mon Sep 17 00:00:00 2001
From: MasaoFujii masao.fu...@gmail.com
Date: Thu, 12 Jun 2014 19:32:00 +0900
Subject: [PATCH] Add log_replication_command configuration parameter.

This GUC causes replication commands like IDENTIFY_SYSTEM
to be logged in the server log.
---
 doc/src/sgml/config.sgml  |   16 
 src/backend/replication/walsender.c   |   11 +--
 src/backend/utils/misc/guc.c  |9 +
 src/backend/utils/misc/postgresql.conf.sample |1 +
 src/include/replication/walsender.h   |1 +
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 697cf99..8532f08 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4534,6 +4534,22 @@ FROM pg_stat_activity;
   /listitem
  /varlistentry
 
+ varlistentry id=guc-log-replication-command xreflabel=log_replication_command
+  termvarnamelog_replication_command/varname (typeboolean/type)
+  indexterm
+   primaryvarnamelog_replication_command/ configuration parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+Causes each replication command to be logged in the server log.
+See xref linkend=protocol-replication for more information about
+replication command. The default value is literaloff/.
+Only superusers can change this setting.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-log-temp-files xreflabel=log_temp_files
   termvarnamelog_temp_files/varname (typeinteger/type)
   indexterm
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 088ee2c..009ad92 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -108,6 +108,7 @@ bool		am_db_walsender = false;	/* Connected to a database? */
 int			max_wal_senders = 0;	/* the maximum number of concurrent walsenders */
 int			wal_sender_timeout = 60 * 1000;		/* maximum time to send one
  * WAL data message */
+bool		log_replication_command = false;
 
 /*
  * State for WalSndWakeupRequest
@@ -1261,13 +1262,19 @@ exec_replication_command(const char *cmd_string)
 	MemoryContext old_context;
 
 	/*
+	 * Log replication command if log_replication_command is enabled.
+	 * Even when it's disabled, log the command with DEBUG1 level for
+	 * backward compatibility.
+	 */
+	ereport(log_replication_command ? LOG : DEBUG1,
+			(errmsg(received replication command: %s, cmd_string)));
+
+	/*
 	 * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
 	 * command arrives. Clean up the old stuff if there's anything.
 	 */
 	SnapBuildClearExportedSnapshot();
 
-	elog(DEBUG1, received replication command: %s, cmd_string);
-
 	CHECK_FOR_INTERRUPTS();
 
 	cmd_context = AllocSetContextCreate(CurrentMemoryContext,
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 1d094f0..427af79 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -931,6 +931,15 @@ static struct config_bool ConfigureNamesBool[] =
 		NULL, NULL, NULL
 	},
 	{
+		{log_replication_command, PGC_SUSET, LOGGING_WHAT,
+			gettext_noop(Logs each replication command.),
+			NULL
+		},
+		log_replication_command,
+		false,
+		NULL, NULL, NULL
+	},
+	{
 		{debug_assertions, PGC_USERSET, DEVELOPER_OPTIONS,
 			gettext_noop(Turns on various assertion checks.),
 			gettext_noop(This is a debugging aid.),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index d109394..70d86cf 100644
---

Re: [HACKERS] Inaccuracy in VACUUM's tuple count estimates

Hi Tom,

On 2014-06-06 15:44:25 -0400, Tom Lane wrote:
 I figured it'd be easy enough to get a better estimate by adding another
 counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively
 assuming that in-progress inserts and deletes will both commit).

Did you plan to backpatch that? My inclination would be no...

  I did
 that, and found that it helped Tim's test case not at all :-(.  A bit of
 sleuthing revealed that HeapTupleSatisfiesVacuum actually returns
 INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of
 whether the transaction has since marked it for deletion:
 
 /*
  * It'd be possible to discern between INSERT/DELETE in progress
  * here by looking at xmax - but that doesn't seem beneficial for
  * the majority of callers and even detrimental for some. We'd
  * rather have callers look at/wait for xmin than xmax. It's
  * always correct to return INSERT_IN_PROGRESS because that's
  * what's happening from the view of other backends.
  */
 return HEAPTUPLE_INSERT_IN_PROGRESS;
 
 It did not use to blow this question off: back around 8.3 you got
 DELETE_IN_PROGRESS if the tuple had a delete pending.  I think we need
 less laziness + fuzzy thinking here.  Maybe we should have a separate
 HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code?  Is it *really*
 the case that callers other than VACUUM itself are okay with failing
 to make this distinction?  I'm dubious: there are very few if any
 callers that treat the INSERT and DELETE cases exactly alike.

My current position on this is that we should leave the code as is 9.4
and HEAPTUPLE_INSERT_IN_PROGRESS for the 9.4/master. Would you be ok
with that? The second best thing imo would be to discern and return
HEAPTUPLE_INSERT_IN_PROGRESS/HEAPTUPLE_DELETE_IN_PROGRESS for the
respective cases.
Which way would you like to go?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Audit of logout

2014-06-12 Thread Fujii Masao

Hi,

Some users enable log_disconnections in postgresql.conf to audit all logouts.
But since log_disconnections is defined with PGC_BACKEND, it can be changed
at connection start. This means that any client (even nonsuperuser) can freely
disable log_disconnections not to log his or her logout even when the
system admin
enables it in postgresql.conf. Isn't this problematic for audit?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] cancelling statement due to user request error occurs but the transaction has committed.

2014-06-12 Thread Naoya Anzai

Hi,

 Well, the only other principled fix I can see is to add a new reponse
 along the lines of ERRORBUTITCOMMITTED, which does not seem attractive
 either, since all clients will have to be taught to understand it.

+1

I think current specification hard to understand for many users.
It is really good if PostgreSQL gave us a message such as a replication abort 
warning:
###
WARNING:  canceling wait for synchronous replication due to user request
DETAIL:  The transaction has already committed locally, but might not have been 
replicated to the standby.
###

Regards,

Naoya

---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: anzai-na...@mxu.nes.nec.co.jp
---



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RETURNING PRIMARY KEY syntax extension

2014-06-12 Thread Jochem van Dieten

On Thu, Jun 12, 2014 at 12:25 PM, Ian Barwick wrote:

 On 14/06/12 18:46, Jochem van Dieten wrote:
  I haven't checked the code, but I am hoping it will help with the problem
  where a RETURNING * is added to a statement that is not an insert or
 update
  by the JDBC driver. That has been reported on the JDBC list at least
 twice,
  and the proposed workaround is neither very elegant nor very robust:
 
 https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ

 Unfortunately that seems to be a JDBC-specific issue, which is outside
 of the scope of this particular patch (which proposes additional
 server-side
 syntax intended to make RETURNING * operations more efficient for
 certain use cases, but which is in itself not a JDBC change).


But the obvious way to fix the JDBC issue is not to fix it by adding a
'mini parser' on the JDBC side, but to make SELECT ... RETURNING PRIMARY
KEY a regular select that silently ignores the returning clause and doesn't
throw an error on the server-side.

That might still be outside the scope of this particular patch, but it
would provide (additional) justification if it were supported.

Jochem


-- 
Jochem van Dieten
http://jochem.vandieten.net/

Re: [HACKERS] RETURNING PRIMARY KEY syntax extension

On 2014-06-12 13:58:31 +0200, Jochem van Dieten wrote:
 But the obvious way to fix the JDBC issue is not to fix it by adding a
 'mini parser' on the JDBC side, but to make SELECT ... RETURNING PRIMARY
 KEY a regular select that silently ignores the returning clause and doesn't
 throw an error on the server-side.

Brr. Then it'd need to be added to all statements, not just SELECT. I've
my doubts that's going to happen.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RETURNING PRIMARY KEY syntax extension

2014-06-12 Thread Ian Barwick

On 14/06/12 20:58, Jochem van Dieten wrote:
 On Thu, Jun 12, 2014 at 12:25 PM, Ian Barwick wrote:
 
 On 14/06/12 18:46, Jochem van Dieten wrote:
  I haven't checked the code, but I am hoping it will help with the 
 problem
  where a RETURNING * is added to a statement that is not an insert or 
 update
  by the JDBC driver. That has been reported on the JDBC list at least 
 twice,
  and the proposed workaround is neither very elegant nor very robust:
  
 https://groups.google.com/forum/#!msg/pgsql.interfaces.jdbc/7WY60JX3qyo/-v1fqDqLQKwJ
 
 Unfortunately that seems to be a JDBC-specific issue, which is outside
 of the scope of this particular patch (which proposes additional 
 server-side
 syntax intended to make RETURNING * operations more efficient for
 certain use cases, but which is in itself not a JDBC change).
 
 
 But the obvious way to fix the JDBC issue is not to fix it by adding a 'mini 
 parser' on
 the JDBC side, but to make SELECT ... RETURNING PRIMARY KEY a regular select 
 that silently
 ignores the returning clause and doesn't throw an error on the server-side.
 
 That might still be outside the scope of this particular patch, but it would 
 provide 
 (additional) justification if it were supported.

That would be adding superfluous, unused and unusable syntax of no potential 
value
(there is no SELECT ... RETURNING and it wouldn't make any sense if there was) 
as a
workaround for a driver issue - not going to happen.

Regards

Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:

 Even aside from security exposures, how
 does a non-superuser who runs pg_dump know whether they've got a
 complete backup or a filtered dump that's missing some rows?

This seems to me to be a killer objection to the feature as
proposed, and points out a huge difference between column level
security and the proposed implementation of row level security. 
(In fact it is a difference between just about any GRANTed
permission and row level security.)  If you try to SELECT * FROM
sometable and you don't have rights to all the columns, you get an
error.  A dump would always either work as expected or generate an
error.

test=# create user bob;
CREATE ROLE
test=# create user bill;
CREATE ROLE
test=# set role bob;
SET
test= create table person (person_id int not null primary key,
name text not null, ssn text);
CREATE TABLE
test= grant select (person_id, name) on table person to bill;
GRANT
test= reset role;
RESET
test=# set role bill;
SET
test= select person_id, name from person;
 person_id | name 
---+--
(0 rows)

test= select * from person;
ERROR:  permission denied for relation person

The proposed approach would leave the validity of any dump which
was not run as a superuser in doubt.  The last thing we need, in
terms of improving security, is another thing you can't do without
connecting as a superuser.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared memory changes in 9.4?

2014-06-12 Thread Christoph Berg

Re: Andres Freund 2014-06-12 20140612094112.gz8...@alap3.anarazel.de
  * Make initdb determine the best shm type for this platform and write
it into postgresql.conf as it does now.
  * If no dynamic_shared_memory_type is found in the config, default to
none.
  * Modify the three identical error messages concerned about shm
segments to include the shm type instead of always just saying
FATAL:  could not open shared memory segment
  * Add a HINT to the POSIX error message:
HINT: This might indicate that /dev/shm is not mounted, or its
permissions do not allow the database user to create files there
 
 Sounds like a sane plan to me.

Here are two patches, one that implements the annotated error
messages, and one that selects none as default.

It might also make sense to add a Note that POSIX depends on /dev/shm,
and also a Note that dynamic_shared_memory_type is not related to
the shared_buffers shm segments, which I didn't include here.

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/
diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c
new file mode 100644
index 0819641..780e3a5
*** a/src/backend/storage/ipc/dsm_impl.c
--- b/src/backend/storage/ipc/dsm_impl.c
*** dsm_impl_posix(dsm_op op, dsm_handle han
*** 289,296 
  		if (errno != EEXIST)
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
! 	 errmsg(could not open shared memory segment \%s\: %m,
! 			name)));
  		return false;
  	}
  
--- 289,299 
  		if (errno != EEXIST)
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
! 	 errmsg(could not open POSIX shared memory segment \%s\: %m,
! 			name),
! 	 errhint(This error usually means that /dev/shm is not mounted, or its 
! 			 permissions do not allow the database user to create files 
! 			 there.)));
  		return false;
  	}
  
*** dsm_impl_posix(dsm_op op, dsm_handle han
*** 313,319 
  
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
! 	 errmsg(could not stat shared memory segment \%s\: %m,
  			name)));
  			return false;
  		}
--- 316,322 
  
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
! 	 errmsg(could not stat POSIX shared memory segment \%s\: %m,
  			name)));
  			return false;
  		}
*** dsm_impl_posix(dsm_op op, dsm_handle han
*** 332,338 
  
  		ereport(elevel,
  (errcode_for_dynamic_shared_memory(),
! 		 errmsg(could not resize shared memory segment %s to %zu bytes: %m,
  name, request_size)));
  		return false;
  	}
--- 335,341 
  
  		ereport(elevel,
  (errcode_for_dynamic_shared_memory(),
! 		 errmsg(could not resize POSIX shared memory segment %s to %zu bytes: %m,
  name, request_size)));
  		return false;
  	}
*** dsm_impl_posix(dsm_op op, dsm_handle han
*** 358,364 
  
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!    errmsg(could not unmap shared memory segment \%s\: %m,
  		  name)));
  			return false;
  		}
--- 361,367 
  
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!    errmsg(could not unmap POSIX shared memory segment \%s\: %m,
  		  name)));
  			return false;
  		}
*** dsm_impl_posix(dsm_op op, dsm_handle han
*** 382,388 
  
  		ereport(elevel,
  (errcode_for_dynamic_shared_memory(),
!  errmsg(could not map shared memory segment \%s\: %m,
  		name)));
  		return false;
  	}
--- 385,391 
  
  		ereport(elevel,
  (errcode_for_dynamic_shared_memory(),
!  errmsg(could not map POSIX shared memory segment \%s\: %m,
  		name)));
  		return false;
  	}
*** dsm_impl_sysv(dsm_op op, dsm_handle hand
*** 512,518 
  errno = save_errno;
  ereport(elevel,
  		(errcode_for_dynamic_shared_memory(),
! 		 errmsg(could not get shared memory segment: %m)));
  			}
  			return false;
  		}
--- 515,521 
  errno = save_errno;
  ereport(elevel,
  		(errcode_for_dynamic_shared_memory(),
! 		 errmsg(could not get System V shared memory segment: %m)));
  			}
  			return false;
  		}
*** dsm_impl_sysv(dsm_op op, dsm_handle hand
*** 530,536 
  		{
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!    errmsg(could not unmap shared memory segment \%s\: %m,
  		  name)));
  			return false;
  		}
--- 533,539 
  		{
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!    errmsg(could not unmap System V shared memory segment \%s\: %m,
  		  name)));
  			return false;
  		}
*** dsm_impl_sysv(dsm_op op, dsm_handle hand
*** 540,546 
  		{
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!   errmsg(could not remove shared memory segment \%s\: %m,
  		 name)));
  			return false;
  		}
--- 543,549 
  		{
  			ereport(elevel,
  	(errcode_for_dynamic_shared_memory(),
!   errmsg(could

Re: [HACKERS] replication commands and log_statements

On Wed, Jun 11, 2014 at 7:42 AM, Magnus Hagander mag...@hagander.net wrote:
 Replication commands like IDENTIFY_COMMAND are not logged even when
 log_statements is set to all. Some users who use log_statements to
 audit *all* statements might dislike this current situation. So I'm
 thinking to change log_statements or add something like log_replication
 so that we can log replication commands. Thought?

 +1. I think adding a separate parameter is the way to go.

 The other option would be to turn log_statements into a parameter that you
 specify multiple ones

I kind of like this idea, but...

 - so instead of all today it would be ddl,dml,all
 or something like that, and then you'd also add replication as an option.
 But that would cause all sorts of backwards compatibility annoyances.. And
 do you really want to be able to say things like ddl,all meanin you'd get
 ddl and select but not dml?

...you lost me here.  I mean, I think it could be quite useful to
redefine the existing GUC as a list.  We could continue to have ddl,
dml, and all as tokens that would be in the list, but you wouldn't
write ddl,dml,all because all would include everything that those
other ones would log.  But then you could have combinations like
dml,replication and so on.  And you could do much more fine-grained
things, like allow log_statement=create,alter,drop to log all such
statements but not, for example, cluster.

I think if we go the route of adding a separate GUC for this, we're
going to get tired of adding GUCs way before we've come close to
meeting the actual requirements in this area.  A comma-separated list
of tokens seems to offer a lot more flexibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-12 Thread Fujii Masao

On Tue, May 27, 2014 at 2:05 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Sun, May 11, 2014 at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think it's clearly *necessary* to forbid setting data_directory in
 postgresql.auto.conf.  The file is defined to be found in the data
 directory, so any such setting is circular logic by definition;
 no good can come of not rejecting it.

 We already have a GUC flag bit about disallowing certain variables
 in the config file (though I don't remember if it's enforced or
 just advisory).  It seems to me that we'd better invent one for
 disallowing in ALTER SYSTEM, as well.

 I introduced a new flag bit (GUC_DISALLOW_IN_AUTO_FILE) to
 disallow parameters by Alter System and disallowed data_directory to
 be set by Alter System.

We should document what types of parameters are not allowed to be set by
ALTER SYSTEM SET?

data_directory was displayed when I typed TAB just after ALTER SYSTEM SET.
Probably tab-completion for ALTER SYSTEM SET needs to be changed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Few observations in replication slots related code

2014-06-12 Thread Amit Kapila

On Thu, Jun 12, 2014 at 5:02 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-06-12 09:15:08 +0200, Andres Freund wrote:
   6.
   elog(ERROR, cannot handle changeset extraction yet);
  
   Shouldn't it be better to use logical replication instead
   of changeset extraction?
 
  Will change.

 I don't see that message anywhere in current code?

Right, actually I was reading code from Git History and also
referring latest code, so it seems I have seen that in original
commit and missed to verify it in latest code.

While checking latest code, I got usage of *changeset extraction*
in below comment:

/*

..

*

 * This is useful to initialize the cutoff xid after which a new *changeset*

 * *extraction* replication slot can start decoding changes.

 *

..

*/


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

Noah Misch n...@leadboat.com writes:
 lo_new() or lo_make()?  An earlier draft of the patch that added
 lo_create(oid, bytea) had a similar function named make_lo().

It appears that lo_make() has a small plurality, but before we lock
that name in, there was one other idea that occurred to me: the
underlying C function is currently named lo_create_bytea(), and
that seems like not an awful choice for the SQL name either.

Any other votes out there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

On 2014-06-12 10:48:01 -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  lo_new() or lo_make()?  An earlier draft of the patch that added
  lo_create(oid, bytea) had a similar function named make_lo().
 
 It appears that lo_make() has a small plurality, but before we lock
 that name in, there was one other idea that occurred to me: the
 underlying C function is currently named lo_create_bytea(), and
 that seems like not an awful choice for the SQL name either.
 
 Any other votes out there?

I prefer lo_create_bytea() or even lo_create_from_bytea(),
lo_from_bytea().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

2014-06-12 Thread Alvaro Herrera

Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  lo_new() or lo_make()?  An earlier draft of the patch that added
  lo_create(oid, bytea) had a similar function named make_lo().
 
 It appears that lo_make() has a small plurality, but before we lock
 that name in, there was one other idea that occurred to me: the
 underlying C function is currently named lo_create_bytea(), and
 that seems like not an awful choice for the SQL name either.
 
 Any other votes out there?

I was also going to suggest lo_create_bytea().  Another similar
possibility would be lo_from_bytea() -- or, since we have overloading
(and we can actually use it in this case without breaking libpq), we
could just have lo_from(oid, bytea).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared memory changes in 9.4?

On Thu, Jun 12, 2014 at 5:41 AM, Andres Freund and...@2ndquadrant.com wrote:
  With regard to Andres' proposal, I'm not that keen on setting
  dynamic_shared_memory_type='none' by default.

 Note that I'm not proposing to disable the whole thing. Just that a
 unset dynamic_shared_memory_type doesn't configure dsm. Initdb would
 still configure it after probing.

OK, I misunderstood your position; thanks for clarifying.  I think
that would be OK with me.  To some degree, I think that the test setup
is broken by design: while we try to maintain backward-compatibility
for postgresql.conf files, there's never been any guarantee that an
old postgresql.conf file will work on a newer server.  For example, a
whole lot of pre-8.4 users probably had max_fsm_pages and
max_fsm_relations configured, and with 8.4, those settings went away.
Fixing that kind of thing is an essential part of the upgrade process.

That having been said, in this particular case, we can probably ease
the pain without much downside by doing as you suggest.  The only
thing I'm worried about is that people will discover much later that
they don't have working dynamic shared memory, and be unhappy about
that.  Sometimes it's better to complain loudly at the beginning than
to leave buried problems for later.  But I'll defer to the majority on
what to do in his instance.

 To me, the following should be done:
 * Make initdb determine the best shm type for this platform and write
   it into postgresql.conf as it does now.
 * If no dynamic_shared_memory_type is found in the config, default to
   none.
 * Modify the three identical error messages concerned about shm
   segments to include the shm type instead of always just saying
   FATAL:  could not open shared memory segment
 * Add a HINT to the POSIX error message:
   HINT: This might indicate that /dev/shm is not mounted, or its
   permissions do not allow the database user to create files there

 Sounds like a sane plan to me.

+1 to the rest of this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

On Thu, Jun 12, 2014 at 10:56 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  lo_new() or lo_make()?  An earlier draft of the patch that added
  lo_create(oid, bytea) had a similar function named make_lo().

 It appears that lo_make() has a small plurality, but before we lock
 that name in, there was one other idea that occurred to me: the
 underlying C function is currently named lo_create_bytea(), and
 that seems like not an awful choice for the SQL name either.

 Any other votes out there?

 I was also going to suggest lo_create_bytea().

That sounds good to me, too.

Presumably we should also fix libpq to not be so dumb.  I mean, it
doesn't help with the immediate problem, since as you say there could
be non-upgraded copies of libpq out there for the indefinite future,
but it still seems like something we oughta fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Any other votes out there?

 I was also going to suggest lo_create_bytea().  Another similar
 possibility would be lo_from_bytea() -- or, since we have overloading
 (and we can actually use it in this case without breaking libpq), we
 could just have lo_from(oid, bytea).

Andres also mentioned lo_from_bytea(), and I kinda like it too.
I don't like plain lo_from(), as it's not too apparent what it's
supposed to get data from --- someone might think the second
parameter was supposed to be a file name a la lo_import(),
for example.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suppressing unused subquery output columns

On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The attached draft patch fixes this by deleting unused output expressions
 from unflattened subqueries, so that we get:

 regression=# explain select f1 from (select * from t1 left join t2 on f1=f2 
 limit 1) ss;
 QUERY PLAN
 --
  Subquery Scan on ss  (cost=0.00..0.02 rows=1 width=4)
-  Limit  (cost=0.00..0.01 rows=1 width=4)
  -  Seq Scan on t1  (cost=0.00..34.00 rows=2400 width=4)
  Planning time: 0.193 ms
 (4 rows)

 I'm not entirely convinced that it's worth the extra planning cycles,
 though.  Given the small number of complaints to date, it might not
 be worth doing this.  Thoughts?

I've encountered this issue before and think it would be great to fix
it.  I'm not sure precisely how many cycles it's worth paying, but
definitely more than zero.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

2014-06-12 Thread Pavel Stehule

Lo_from_bytea sounds me better than lo_create_bytea

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

Robert Haas robertmh...@gmail.com writes:
 Presumably we should also fix libpq to not be so dumb.  I mean, it
 doesn't help with the immediate problem, since as you say there could
 be non-upgraded copies of libpq out there for the indefinite future,
 but it still seems like something we oughta fix.

It's been in the back of my mind for awhile that doing a dynamic query at
all here is pretty pointless.  It's not like the OIDs of those functions
ever have or ever will move.  It would probably be more robust if we
just let libpq be a consumer of fmgroids.h and refer directly to the
constants F_LO_CREATE etc.

I think there was some previous discussion about this, possibly tied
to also having a better-defined way to let clients depend on hard-wired
type OIDs, but I'm too lazy to search the archives right now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shared memory changes in 9.4?

Hi,

On 2014-06-12 11:08:35 -0400, Robert Haas wrote:
 On Thu, Jun 12, 2014 at 5:41 AM, Andres Freund and...@2ndquadrant.com wrote:
   With regard to Andres' proposal, I'm not that keen on setting
   dynamic_shared_memory_type='none' by default.
 
  Note that I'm not proposing to disable the whole thing. Just that a
  unset dynamic_shared_memory_type doesn't configure dsm. Initdb would
  still configure it after probing.
 
 OK, I misunderstood your position; thanks for clarifying.  I think
 that would be OK with me.  To some degree, I think that the test setup
 is broken by design: while we try to maintain backward-compatibility
 for postgresql.conf files, there's never been any guarantee that an
 old postgresql.conf file will work on a newer server.  For example, a
 whole lot of pre-8.4 users probably had max_fsm_pages and
 max_fsm_relations configured, and with 8.4, those settings went away.
 Fixing that kind of thing is an essential part of the upgrade process.

While I think I see where you're coming from I don't think these cases
are comparable. In this case commenting out the GUC can prevent the
server from starting. That's pretty odd imo.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Suppressing unused subquery output columns

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jun 5, 2014 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The attached draft patch fixes this by deleting unused output expressions
 from unflattened subqueries, so that we get:
 ...
 I'm not entirely convinced that it's worth the extra planning cycles,
 though.  Given the small number of complaints to date, it might not
 be worth doing this.  Thoughts?

 I've encountered this issue before and think it would be great to fix
 it.  I'm not sure precisely how many cycles it's worth paying, but
 definitely more than zero.

We have a couple votes for this patch and no one has spoken against it,
so I'll go ahead and push it into HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Thu, Jun 5, 2014 at 8:37 PM, Peter Geoghegan p...@heroku.com wrote:
 * A configure AC_TRY_RUN tests the suitability of the system strxfrm()
 implementation for the purposes of this optimization. There can be no
 header bytes (people at pgCon reported redundant bytes on the Mac
 OSX implementation at pgCon, to my great surprise), ...

I'm attaching a small test program demonstrating the Mac behavior.
Here's some sample output:

[rhaas ~]$ ./strxfrm en_US  a aa ab abc abcd abcde peter Geoghegan _ %
 - 
a - 001S001S
aa - 001S001S001S001S
ab - 001S001T001S001T
abc - 001S001T001U001S001T001U
abcd - 001S001T001U001V001S001T001U001V
abcde - 001S001T001U001V001W001S001T001U001V001W
peter - 001b001W001f001W001d001b001W001f001W001d
Geoghegan - 
0019001W001a001Y001Z001W001Y001S001`0019001W001a001Y001Z001W001Y001S001`
_ - 001Q001Q
% - 000W000W

It appears that any string starting with the letter a will create
output that begins with 001S00 and the seventh character always
appears to be 0 or 1:

[rhaas ~]$ ./strxfrm en_US ab ac ad ae af a% a0 a 
ab - 001S001T001S001T
ac - 001S001U001S001U
ad - 001S001V001S001V
ae - 001S001W001S001W
af - 001S001X001S001X
a% - 001S000W001S000W
a0 - 001S000b001S000b
a  - 001S000R001S000R

Also, the total number of bytes produced is apparently 8N+4, where N
is the length of the input string.  On a Linux system I tested, the
output included non-printable characters, so I adapted the test
program to print the results in hex.  Attaching that version, too.
Here, the length was 3N+2, except for 0-length strings which produce a
0-length result:

[rhaas@hydra ~]$ ./strxfrm-binary en_US  a aa ab abc abcd abcde
peter Geoghegan _ %
 -  (0 bytes)
a - 0c01020102 (5 bytes)
aa - 0c0c010202010202 (8 bytes)
ab - 0c0d010202010202 (8 bytes)
abc - 0c0d0e0102020201020202 (11 bytes)
abcd - 0c0d0e0f01020202020102020202 (14 bytes)
abcde - 0c0d0e0f10010202020202010202020202 (17 bytes)
peter - 1b101f101d010202020202010202020202 (17 bytes)
Geoghegan - 12101a121310120c190102020202020202020201040202020202020202
(29 bytes)
_ - 0101010112 (5 bytes)
% - 0101010139 (5 bytes)
[rhaas@hydra ~]$ ./strxfrm-binary en_US ab ac ad ae af a% a0 a 
ab - 0c0d010202010202 (8 bytes)
ac - 0c0e010202010202 (8 bytes)
ad - 0c0f010202010202 (8 bytes)
ae - 0c10010202010202 (8 bytes)
af - 0c11010202010202 (8 bytes)
a% - 0c01020102010239 (8 bytes)
a0 - 0c02010202010202 (8 bytes)
a  - 0c01020102010211 (8 bytes)

Even though each input bytes is generating 3 output bytes, it's not
generating a group of output bytes for each input byte as appears to
be happening on MacOS X.  If it were doing that, then truncating the
blob to 8 bytes would only compare the first 2-3 bytes of the input
string.  In fact we do better.  If we change even the 8th letter in
the string to some other letter, the 8th output byte changes:

[rhaas@hydra ~]$ ./strxfrm-binary en_US  aaab
 - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes)
aaab - 0c0c0c0c0c0c0c0d010202020202020202010202020202020202 (26 bytes)

If we change the capitalization of the eighth byte, then the change
happens further out:

[rhaas@hydra ~]$ ./strxfrm-binary en_US  aaaA
 - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes)
aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020204 (26 bytes)

Still, it's fair to say that on this Linux system, the first 8 bytes
capture a significant portion of the entropy of the first 8 bytes of
the string, whereas on MacOS X you only get entropy from the first 2
bytes of the string.  It would be interesting to see results from
other platforms people might care about also.

 The cost of adding all of these ameliorating measures appears to be
 very low. We're so bottlenecked on memory bandwidth that the
 fixed-size overhead of maintaining poor man cardinality, and the extra
 cycles from hashing n keys for the benefit of HyperLogLog just don't
 seem to matter next to the big savings for n log n comparisons. The
 best case performance is seemingly just as good as before (about a
 200% improvement in transaction throughput for one client, but closer
 to a 250% improvement with many clients and/or expensive collations),
 while the worst case is much much better.

I haven't looked at the patch yet, but this sounds promising.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
#include stdio.h
#include stdlib.h
#include errno.h
#include locale.h
#include string.h

int
main(int argc, char **argv)
{
	char *buffer = NULL;
	size_t buflen = 0;

	if (argc  3)
	{
		fprintf(stderr, Usage: strxfrm {collation} {string}...\n);
		exit(1);
	}
	if (setlocale(LC_COLLATE, argv[1]) == NULL)
	{
		fprintf(stderr, setlocale: %s\n, strerror(errno));
		exit(1);
	}
	argv += 2;

	while (*argv)
	{
		size_t r;

		r = strxfrm(buffer, *argv, buflen);
		if (r  buflen)
		{
			if (buffer != NULL)
free(buffer);

Re: [HACKERS] Proposing pg_hibernate

On Thu, Jun 12, 2014 at 12:17 AM, Gurjeet Singh gurj...@singh.im wrote:
 On Wed, Jun 11, 2014 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 10, 2014 at 10:03 PM, Gurjeet Singh gurj...@singh.im wrote:
 And it's probably accepted by now that such a bahviour is not
 catastrophic, merely inconvenient.

 I think the whole argument for having pg_hibernator is that getting
 the block cache properly initialized is important.  If it's not
 important, then we don't need pg_hibernator in the first place.  But
 if it is important, then I think not loading unrelated blocks into
 shared_buffers is also important.

 I was constructing a contrived scenario, something that would rarely
 happen in reality. I feel that the benefits of this feature greatly
 outweigh the minor performance loss caused in such an unlikely scenario.

So, are you proposing this for inclusion in PostgreSQL core?

If not, I don't think there's much to discuss here - people can use it
or not as they see fit, and we'll see what happens and perhaps design
improvements will result, or not.

If so, that's different: you'll need to demonstrate the benefits via
convincing proof points, and you'll also need to show that the
disadvantages are in fact minor and that the scenario is in fact
unlikely.  So far there are zero performance numbers on this thread, a
situation that doesn't meet community standards for a performance
patch.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

refactoring allpaths.c (was Re: [HACKERS] Suppressing unused subquery output columns)

I wrote:
 We have a couple votes for this patch and no one has spoken against it,
 so I'll go ahead and push it into HEAD.

BTW, I forgot to mention that while working on this patch I was thinking
it's past time to separate out the subquery support in allpaths.c into
its own file.  With this patch, allpaths.c is 2363 lines, about 690 of
which are set_subquery_pathlist and subsidiary functions.  While that
might not be quite tail-wagging-dog level, I think it's enough to justify
splitting that code out into a new file, say optimizer/path/subquerypath.c.

There are also about 630 lines devoted to appendrel path generation,
which might be thought enough to deserve a separate file of its own.
I'm a bit less excited about that though, mainly because the appendrel
logic has some not-quite-arms-length interactions with set_rel_size();
but there's certainly some case to be made for doing it.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-06-12 Thread Claudio Freire

On Thu, Jun 12, 2014 at 1:25 PM, Robert Haas robertmh...@gmail.com wrote:

 It appears that any string starting with the letter a will create
 output that begins with 001S00 and the seventh character always
 appears to be 0 or 1:

 [rhaas ~]$ ./strxfrm en_US ab ac ad ae af a% a0 a 
 ab - 001S001T001S001T
 ac - 001S001U001S001U
 ad - 001S001V001S001V
 ae - 001S001W001S001W
 af - 001S001X001S001X
 a% - 001S000W001S000W
 a0 - 001S000b001S000b
 a  - 001S000R001S000R

...

 [rhaas@hydra ~]$ ./strxfrm-binary en_US  aaaA
  - 0c0c0c0c0c0c0c0c010202020202020202010202020202020202 (26 bytes)
 aaaA - 0c0c0c0c0c0c0c0c010202020202020202010202020202020204 (26 bytes)

 Still, it's fair to say that on this Linux system, the first 8 bytes
 capture a significant portion of the entropy of the first 8 bytes of
 the string, whereas on MacOS X you only get entropy from the first 2
 bytes of the string.  It would be interesting to see results from
 other platforms people might care about also.

If you knew mac's output character set with some certainty, you could
compress it rather efficiently by applying a tabulated decode back
into non-printable bytes. Say, like base64 decoding (the set appears
to be a subset of base64, but it's hard to be sure).


Ie,
x = strxfrm(s)
xz = hex(tab[x[0]] + 64*tab[x[1]] + 64^2*tab[x[2]] ... etc)

This can be made rather efficient. But the first step is defining with
some certainty the output character set.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] add line number as prompt option to psql

2014-06-12 Thread Sawada Masahiko

Hi all,

The attached IWP patch is one prompt option for psql, which shows
current line number.
If the user made syntax error with too long SQL then psql outputs
message as following.

ERROR:  syntax error at or near a
LINE 250: hoge
^
psql teaches me where syntax error is occurred, but it is not kind
when SQL is too long.
We can use write SQL with ¥e(editor) command(e.g., emacs) , and we can
know line number.
but it would make terminal log dirty . It will make analyzing of log
difficult after error is occurred.
(I think that we usually use copy  paste)

After attached this patch, we will be able to use %l option as prompting option.

e.g.,
$ cat ~/.psqlrc
\set PROMPT2 '%/[%l]%R%# '
\set PROMPT1 '%/[%l]%R%# '
$ psql -d postgres
postgres[1]=# select
postgres[2]-# *
postgres[3]-# from
postgres[4]-# hoge;

The past discussion is following.
http://www.postgresql.org/message-id/cafj8prc1rupk6+cha1jpxph3us_zipabdovmceex4wpbp2k...@mail.gmail.com

Please give me feedback.

Regards,

---
Sawada Masahiko


psql-line-number_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

I wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I was also going to suggest lo_create_bytea().  Another similar
 possibility would be lo_from_bytea() -- or, since we have overloading
 (and we can actually use it in this case without breaking libpq), we
 could just have lo_from(oid, bytea).

 Andres also mentioned lo_from_bytea(), and I kinda like it too.
 I don't like plain lo_from(), as it's not too apparent what it's
 supposed to get data from --- someone might think the second
 parameter was supposed to be a file name a la lo_import(),
 for example.

Since the discussion seems to have trailed off, I'm going to run with
lo_from_bytea().  The plan is:

1. Rename the function.
2. Add an opr_sanity regression test memorializing what we should get
from lo_initialize()'s query.
3. Make sure that the regression tests leave behind a few large objects,
so that testing of pg_dump/pg_restore using the regression database
will exercise the large-object code paths.

It'd be a good thing if the TAP tests for client programs included
testing of pg_dump/pg_restore, but that's a bit beyond my competence
with that tool ... anyone care to step up?

Redoing or getting rid of lo_initialize()'s query would be a good thing
too; but that does not seem like material for back-patching into 9.4,
while I think all the above are.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Question about partial functional indexes and the query planner

2014-06-12 Thread Keith Fiske

On Wed, Jun 11, 2014 at 7:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jun 10, 2014 at 7:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Given the lack of previous complaints, I'm not sure this amounts to
  a back-patchable bug, but it does seem like something worth fixing
  going forward.

  Agreed, although I'd be willing to see us slip it into 9.4.  It's
  doubtful that anyone will get upset if their query plans change
  between beta1 and beta2, but the same cannot be said for released
  branches.

 After further thought about this I realized that there's another category
 of proof possibilities that is pretty simple to add while we're at it.
 Namely, once we've found that both subexpressions of the two operator
 clauses are equal(), we can use btree opclass relationships to prove that,
 say, x  y implies x = y or x  y refutes x  y, independently of
 just what x and y are.  (Well, they have to be immutable expressions, but
 we'd not get this far if they're not.)  We already had pretty nearly all
 of the machinery for that, but again it was only used for proving cases
 involving comparisons to constants.

 A little bit of refactoring later, I offer the attached draft patch.
 I'm thinking this is probably more than we want to slip into 9.4
 at this point, but I'll commit it to HEAD soon if there are not
 objections.

 regards, tom lane



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


I applied Tom's patch to the latest HEAD
(e04a9ccd2ccd6e31cc4af6b08257a0a186d0fce8) and showed it to Brian. Looks to
solve the problem he originally reported

$ patch -p1 -i ../better-predicate-proofs-1.patch
(Stripping trailing CRs from patch.)
patching file src/backend/optimizer/util/predtest.c


$ /opt/pgsql_patch_review/bin/psql postgres
Timing is on.
Null display (null) is «NULL».
Expanded display (expanded) is used automatically.
psql (9.5devel)
Type help for help.

postgres=# CREATE OR REPLACE FUNCTION public.return_if_even(v_id integer)
returns integer
postgres-# LANGUAGE sql AS
postgres-# $$
postgres$# SELECT case when v_id % 2 = 1 then 0 else v_id end;
postgres$# $$;
CREATE FUNCTION
Time: 44.669 ms

postgres=# create table public.partial_functional_index_test as
postgres-# select id from generate_series(1,100) AS s(id);
SELECT 100
Time: 1037.993 ms

postgres=# create index partial_functional_idx ON
public.partial_functional_index_test
postgres-# USING btree ( public.return_if_even(id) )
postgres-# WHERE public.return_if_even(id) = id;
LOG:  sending cancel to blocking autovacuum PID 12521
DETAIL:  Process 12424 waits for ShareLock on relation 16385 of database
12217.
STATEMENT:  create index partial_functional_idx ON
public.partial_functional_index_test
USING btree ( public.return_if_even(id) )
WHERE public.return_if_even(id) = id;
ERROR:  canceling autovacuum task
CONTEXT:  automatic analyze of table
postgres.public.partial_functional_index_test
CREATE INDEX
Time: 1658.245 ms

postgres=# explain analyze select count(1) from
public.partial_functional_index_test where public.return_if_even(id) = id;
 QUERY
PLAN
-
 Aggregate  (cost=4818.05..4818.06 rows=1 width=0) (actual
time=2503.851..2503.854 rows=1 loops=1)
   -  Bitmap Heap Scan on partial_functional_index_test
(cost=82.67..4805.55 rows=5000 width=0) (actual time=43.724..1309.309
rows=50 loops=1)
 Recheck Cond: (CASE WHEN ((id % 2) = 1) THEN 0 ELSE id END = id)
 Heap Blocks: exact=4425
 -  Bitmap Index Scan on partial_functional_idx  (cost=0.00..81.42
rows=5000 width=0) (actual time=42.961..42.961 rows=50 loops=1)
 Planning time: 4.245 ms
 Execution time: 2505.281 ms
(7 rows)

Time: 2515.344 ms
postgres=# explain analyze select count(1) from
public.partial_functional_index_test where id = public.return_if_even(id);
 QUERY
PLAN
-
 Aggregate  (cost=4818.05..4818.06 rows=1 width=0) (actual
time=2483.862..2483.866 rows=1 loops=1)
   -  Bitmap Heap Scan on partial_functional_index_test
(cost=82.67..4805.55 rows=5000 width=0) (actual time=40.704..1282.955
rows=50 loops=1)
 Recheck Cond: (CASE WHEN ((id % 2) = 1) THEN 0 ELSE id END = id)
 Heap Blocks: exact=4425
 -  Bitmap Index Scan on partial_functional_idx  (cost=0.00..81.42
rows=5000 width=0) (actual time=39.657..39.657 rows=50 loops=1)
 Planning time: 0.127 ms
 Execution time: 2483.979 ms
(7 rows)


--
Keith Fiske
Database

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
 Since the discussion seems to have trailed off, I'm going to run with
 lo_from_bytea().  The plan is:
 
 1. Rename the function.
 2. Add an opr_sanity regression test memorializing what we should get
 from lo_initialize()'s query.
 3. Make sure that the regression tests leave behind a few large objects,
 so that testing of pg_dump/pg_restore using the regression database
 will exercise the large-object code paths.

Sounds good.

 It'd be a good thing if the TAP tests for client programs included
 testing of pg_dump/pg_restore, but that's a bit beyond my competence
 with that tool ... anyone care to step up?

The pg_upgrade test suite covers this well.

 Redoing or getting rid of lo_initialize()'s query would be a good thing
 too; but that does not seem like material for back-patching into 9.4,
 while I think all the above are.

I agree all the above make sense for 9.4.  I also wouldn't object to a
hardening of lo_initialize() sneaking in at this stage.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] updated emacs configuration

On Tue, Jun 10, 2014 at 10:36:07AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Mon, Jun 09, 2014 at 09:04:02PM -0400, Peter Eisentraut wrote:
  I'd consider just getting rid of the
  
  (c-file-style . bsd)
  
  setting in .dir-locals.el and put the actual interesting settings from
  the style in there.
  
  Another alternative is to change emacs.samples to use
  hack-local-variables-hook instead, as described here:
  http://www.emacswiki.org/emacs/LocalVariables
 
  Those are reasonable alternatives.  The ultimate effect looks similar for 
  all
  three options, to the extent that I don't see a basis for forming a strong
  preference.  Do you have a recommendation?
 
 The more Ludd^H^H^Hcautious among us run with
   (setq enable-local-variables nil)
 and would rather not see the configuration recommendations overcomplicated
 due to an attempt to play nice with directory-local settings we're ignoring
 anyway.  So I'd vote for Peter's first option, ie, kluge up .dir-locals.el
 not the configuration code.

Seeing the two votes cast and only cosmetic differences between the options,
I'll just stick with my original -v1.  Thanks.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make check For Extensions

2014-06-12 Thread Fabien COELHO



Andres said during the unconference last month that there was a way to 
get `make check` to work with PGXS. The idea is that it would initialize 
a temporary cluster, start it on an open port, install an extension, and 
run the extension's test suite. I think the pg_regress --temp-install, 
maybe? I poked through the PGXS makefiles, and although it looks like 
there *might* be something like this for in-core contrib extensions, but 
not for externally-distributed extensions.


Is there something I could add to my extension Makefiles so that `make 
check` or `make test` will do a pre-install test on a temporary cluster?


My 0.02€: It is expected to work, more or less, see the end of

http://www.postgresql.org/docs/9.3/static/extend-pgxs.html

It invokes psql which is expected to work directly. Note that there is 
no temporary installation, it is tested against the installed and running 
postgres. Maybe having the ability to create a temporary installation, as 
you suggest, would be a nice extension.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

Noah Misch n...@leadboat.com writes:
 On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
 It'd be a good thing if the TAP tests for client programs included
 testing of pg_dump/pg_restore, but that's a bit beyond my competence
 with that tool ... anyone care to step up?

 The pg_upgrade test suite covers this well.

Um, not really: what pg_upgrade exercises is pg_dump -s which entirely
fails to cover the data-transfer code paths.  It would not have found
this problem.

BTW, after further testing I realized that it was quite accidental that
I found it either.  pg_restore only uses libpq's lo_create() function
when restoring an old_blob_style archive, ie one generated by 8.4
or earlier.  That's what I happened to try to do last night, but it's
pure luck that I did.

Poking around with making the largeobject regression test leave
some stuff behind, I found out that:

1. That regression test includes the text of a Robert Frost poem that
AFAICT is still under copyright.  I think we'd better replace it with
something by someone a bit more safely dead.

2. I tried to add a COMMENT ON LARGE OBJECT to one of the not-removed
blobs.  While pg_upgrade didn't fail on transferring the blob, it
*did* fail to transfer the comment on it, which seems like a bug.
Bruce, have you got any idea how to fix that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] make check For Extensions

2014-06-12 Thread David E. Wheeler

On Jun 12, 2014, at 11:28 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:

 My 0.02€: It is expected to work, more or less, see the end of
 
 http://www.postgresql.org/docs/9.3/static/extend-pgxs.html

That says:

“The scripts listed in the REGRESS variable are used for regression testing of 
your module, which can be invoked by make installcheck after doing make 
install. For this to work you must have a running PostgreSQL server.”

That does not mean that it starts a new cluster on a port. It means it will 
test it against an existing cluster after you have installed into that cluster.

 It invokes psql which is expected to work directly. Note that there is no 
 temporary installation, it is tested against the installed and running 
 postgres. Maybe having the ability to create a temporary installation, as you 
 suggest, would be a nice extension.

Yes, that’s what I would like, so I could test *before* installing.

Best,

David



signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-12 Thread Gurjeet Singh

On Tue, Jun 10, 2014 at 12:05 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we're going to do this, I would say that we should also take the
 opportunity to get out from under the question of which kernel API
 we're dealing with.  So perhaps a design like this:

 1. If the environment variable PG_OOM_ADJUST_FILE is defined, it's
 the name of a file to write something into after forking.  The typical
 value would be /proc/self/oom_score_adj or /proc/self/oom_adj.
 If not set, nothing happens.

 2. If the environment variable PG_OOM_ADJUST_VALUE is defined, that's
 the string we write, otherwise we write 0.

Please find attached the patch. It includes the doc changes as well.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index 9fadef5..4492a1d 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1284,8 +1284,15 @@ echo -1000  /proc/self/oom_score_adj
 in the postmaster's startup script just before invoking the postmaster.
 Note that this action must be done as root, or it will have no effect;
 so a root-owned startup script is the easiest place to do it.  If you
-do this, you may also wish to build productnamePostgreSQL/
-with literal-DLINUX_OOM_SCORE_ADJ=0/ added to varnameCPPFLAGS/.
+do this, you may also wish to export these environment variables
+programlisting
+PG_OOM_ADJUST_FILE=oom_score_adj
+PG_OOM_ADJUST_VALUE=0
+
+export PG_OOM_ADJUST_FILE
+export PG_OOM_ADJUST_VALUE
+/programlisting
+in the startup script, before invoking the postmaster.
 That will cause postmaster child processes to run with the normal
 varnameoom_score_adj/ value of zero, so that the OOM killer can still
 target them at need.
@@ -1296,8 +1303,7 @@ echo -1000  /proc/self/oom_score_adj
 but may have a previous version of the same functionality called
 filename/proc/self/oom_adj/.  This works the same except the disable
 value is literal-17/ not literal-1000/.  The corresponding
-build flag for productnamePostgreSQL/ is
-literal-DLINUX_OOM_ADJ=0/.
+value for envarPG_OOM_ADJUST_FILE/ should be varnameoom_adj/.
/para
 
note
diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
index f6df2de..21559af 100644
--- a/src/backend/postmaster/fork_process.c
+++ b/src/backend/postmaster/fork_process.c
@@ -22,6 +22,13 @@
 #endif
 
 #ifndef WIN32
+
+#ifdef __linux__
+static bool	oom_env_checked = false;
+static char	oom_adj_file[MAXPGPATH];
+static int	oom_adj_value = 0;
+#endif	/* __linux__ */
+
 /*
  * Wrapper for fork(). Return values are the same as those for fork():
  * -1 if the fork failed, 0 in the child process, and the PID of the
@@ -78,13 +85,38 @@ fork_process(void)
 		 * LINUX_OOM_SCORE_ADJ #defined to 0, or to some other value that you
 		 * want child processes to adopt here.
 		 */
-#ifdef LINUX_OOM_SCORE_ADJ
+#ifdef __linux__
+		if (!oom_env_checked)
+		{
+			char *env;
+
+			oom_env_checked = true;
+
+			env = getenv(PG_OOM_ADJUST_FILE);
+
+			/* Don't allow a path separator in file name */
+			if (env  !strchr(env, '/')  !strchr(env, '\\'))
+			{
+snprintf(oom_adj_file, MAXPGPATH, /proc/self/%s, env);
+
+env = getenv(PG_OOM_ADJUST_VALUE);
+if (env)
+{
+	oom_adj_value = atoi(env);
+}
+else
+	oom_adj_value = 0;
+			}
+			else
+oom_adj_file[0] = '\0';
+		}
+
 		{
 			/*
 			 * Use open() not stdio, to ensure we control the open flags. Some
 			 * Linux security environments reject anything but O_WRONLY.
 			 */
-			int			fd = open(/proc/self/oom_score_adj, O_WRONLY, 0);
+			int			fd = open(oom_adj_file, O_WRONLY, 0);
 
 			/* We ignore all errors */
 			if (fd = 0)
@@ -92,41 +124,14 @@ fork_process(void)
 char		buf[16];
 int			rc;
 
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_SCORE_ADJ);
+snprintf(buf, sizeof(buf), %d\n, oom_adj_value);
 rc = write(fd, buf, strlen(buf));
 (void) rc;
 close(fd);
 			}
 		}
-#endif   /* LINUX_OOM_SCORE_ADJ */
 
-		/*
-		 * Older Linux kernels have oom_adj not oom_score_adj.  This works
-		 * similarly except with a different scale of adjustment values. If
-		 * it's necessary to build Postgres to work with either API, you can
-		 * define both LINUX_OOM_SCORE_ADJ and LINUX_OOM_ADJ.
-		 */
-#ifdef LINUX_OOM_ADJ
-		{
-			/*
-			 * Use open() not stdio, to ensure we control the open flags. Some
-			 * Linux security environments reject anything but O_WRONLY.
-			 */
-			int			fd = open(/proc/self/oom_adj, O_WRONLY, 0);
-
-			/* We ignore all errors */
-			if (fd = 0)
-			{
-char		buf[16];
-int			rc;
-
-snprintf(buf, sizeof(buf), %d\n, LINUX_OOM_ADJ);
-rc = write(fd, buf, strlen(buf));
-(void) rc;
-close(fd);
-			}
-		}
-#endif   /* LINUX_OOM_ADJ */
+#endif   /* __linux__ */
 
 		/*
 		 * Make sure processes do not share OpenSSL randomness state.

-- 
Sent via pgsql-hackers mailing

[HACKERS] Crash on backend exit w/ OpenLDAP [2.4.24, 2.4.31]

With LDAP support enabled, we link the backend with libldap, and we link libpq
with libldap_r.  Modules like dblink and postgres_fdw link to libpq, so
loading them results in a backend having both libldap and libdap_r loaded.
Those libraries export the same symbols, and the load order rule gives
priority to the libldap symbols.  So far, so good.  However, both libraries
declare a GCC destructor, ldap_int_destroy_global_options(), to free memory
reachable from a global variable, ldap_int_global_options.  In OpenLDAP
versions 2.4.24 through 2.4.31, that variable's structure type has an
incompatible layout in libldap vs. libldap_r.  When the libldap_r build of the
destructor uses pointers from the ldap_int_global_options of libldap, SIGSEGV
ensues.  This does not arise if nothing in the process ever caused OpenLDAP to
initialize itself, because the relevant bytes then happen to be all-zero in
either structure layout.  OpenLDAP 2.4.32 fixed this by making the libldap
structure a strict prefix of the libldap_r structure.  The OpenLDAP change log
merely says Fixed libldap sasl handling (ITS#7118, ITS#7133); here are the
relevant commits:

  
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=270ef33acf18dc13bfd07f8a8e66b446f80e7d27
  
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=commitdiff;h=7ff18967d7d2e49baa9d80f1b9408b276f3982e0

You can cause the at-exit crash by building PostgreSQL against OpenLDAP
2.4.31, connecting with LDAP authentication, and issuing LOAD 'dblink'.
Alternately, connect with any authentication method and create a dblink
connection that uses an LDAP-based pg_service.conf entry.  I'm attaching a
test suite patch to illustrate that second vector.

The popularity of the affected OpenLDAP versions is not clear to me.  RHEL 5,
6 and 7 all ship OpenLDAP either old enough or new enough to miss the problem.
Debian wheezy ships 2.4.31, an affected version.  I have not looked beyond
that.  I pondered what we could do short of treating this as a pure OpenLDAP
bug for packagers to fix in their OpenLDAP builds, but I did not come up with
anything singularly attractive.  Some possibilities:

1. Link the backend with libldap_r, so we never face the mismatch.  On some
platforms, this means also linking in threading libraries.

2. Link a second copy of libpq against plain libldap and use that in server
modules like dblink.

3. Wipe the memory of ldap_int_global_options so the destructor will have
nothing to do.  A challenge here is that we don't know the structure size;
it's not part of any installed header, and it varies in response to OpenLDAP
configuration options.  Still, this is achievable in principle.  This would be
easy if the destructor function weren't static, alas.

4. Detect older OpenLDAP versions at runtime, just before we would otherwise
initialize OpenLDAP, and raise an error.  Possibly make the same check at
compile time, for packager convenience.

5. Use only _exit(), not exit().

Hopefully I'm missing a great alternative, because each of those has something
substantial to dislike about it.  Thoughts welcome, especially from packagers
dealing with platforms that use affected OpenLDAP versions.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/contrib/dblink/expected/dblink.out 
b/contrib/dblink/expected/dblink.out
index f237c43..fd61985 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -103,6 +103,18 @@ SELECT *
 FROM dblink('SELECT * FROM foo') AS t(a int, b text, c text[])
 WHERE t.a  7;
 ERROR:  connection not available
+-- The first-level connection's backend will crash on exit given OpenLDAP
+-- [2.4.24, 2.4.31].  Give the postmaster time to act on the crash.
+SELECT pg_sleep(1)
+FROM dblink('dbname=contrib_regression', $$SELECT * FROM
+   dblink('service=test_ldap dbname=contrib_regression',
+  'select 1') t(c int)$$)
+AS t(c int);
+ pg_sleep 
+--
+ 
+(1 row)
+
 -- create a persistent connection
 SELECT dblink_connect('dbname=contrib_regression');
  dblink_connect 
diff --git a/contrib/dblink/sql/dblink.sql b/contrib/dblink/sql/dblink.sql
index 2a10760..fa98285 100644
--- a/contrib/dblink/sql/dblink.sql
+++ b/contrib/dblink/sql/dblink.sql
@@ -65,6 +65,14 @@ SELECT *
 FROM dblink('SELECT * FROM foo') AS t(a int, b text, c text[])
 WHERE t.a  7;
 
+-- The first-level connection's backend will crash on exit given OpenLDAP
+-- [2.4.24, 2.4.31].  Give the postmaster time to act on the crash.
+SELECT pg_sleep(1)
+FROM dblink('dbname=contrib_regression', $$SELECT * FROM
+   dblink('service=test_ldap dbname=contrib_regression',
+  'select 1') t(c int)$$)
+AS t(c int);
+
 -- create a persistent connection
 SELECT dblink_connect('dbname=contrib_regression');
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index 4b31c0a..19bfe75 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Peter Geoghegan p...@heroku.com writes:
 ... In any case it's pretty clear that a goal of
 the glibc implementation is to concentrate as much entropy as possible
 into the first part of the string, and that's the important point.
 This makes perfect sense, and is why I was so incredulous about the
 Mac behavior.

I think this may be another facet of something we were already aware of,
which is that the UTF8 locales on OS X pretty much suck.  It's fairly
clear that Apple has put no effort into achieving more than minimal
standards compliance for those.  Sorting doesn't work as expected in
those locales, for example.

Still, that's reality, and any proposal to rely on strxfrm is going to
have to deal with it :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq

On Thu, Jun 12, 2014 at 02:53:23PM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Thu, Jun 12, 2014 at 01:53:19PM -0400, Tom Lane wrote:
  It'd be a good thing if the TAP tests for client programs included
  testing of pg_dump/pg_restore, but that's a bit beyond my competence
  with that tool ... anyone care to step up?
 
  The pg_upgrade test suite covers this well.
 
 Um, not really: what pg_upgrade exercises is pg_dump -s which entirely
 fails to cover the data-transfer code paths.  It would not have found
 this problem.

I see.  TAP suite coverage for a data-included dump/restore would be worth its
weight, agreed.

 BTW, after further testing I realized that it was quite accidental that
 I found it either.  pg_restore only uses libpq's lo_create() function
 when restoring an old_blob_style archive, ie one generated by 8.4
 or earlier.  That's what I happened to try to do last night, but it's
 pure luck that I did.

That could have easily remained undiscovered until release day.  Not good.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-06-12 Thread Peter Geoghegan

Thanks for looking into this.

On Thu, Jun 12, 2014 at 9:25 AM, Robert Haas robertmh...@gmail.com wrote:
 Still, it's fair to say that on this Linux system, the first 8 bytes
 capture a significant portion of the entropy of the first 8 bytes of
 the string, whereas on MacOS X you only get entropy from the first 2
 bytes of the string.  It would be interesting to see results from
 other platforms people might care about also.

Right. It was a little bit incautious of me to say that we had the
full benefit of 8 bytes of storage with en_US.UTF-8, since that is
only true of lower case characters (I think that FreeBSD can play
tricks with this. Sometimes, it will give you the benefit of 8 bytes
of entropy for an 8 byte string, with only non-differentiating
trailing bytes, so that the first 8 bytes of Aaaa are distinct
from the first eight bytes of , while any trailing bytes are
non-distinct for both). In any case it's pretty clear that a goal of
the glibc implementation is to concentrate as much entropy as possible
into the first part of the string, and that's the important point.
This makes perfect sense, and is why I was so incredulous about the
Mac behavior. After all, the Open Group's strcoll() documentation
says:

The strxfrm() and strcmp() functions should be used for sorting large lists.

Sorting text is hardly an infrequent requirement -- it's almost the
entire reason for having strxfrm() in the standard. You're always
going to want to have each strcmp() find differences as early as
possible.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Gregory Smith


On 6/11/14, 10:26 AM, Robert Haas wrote:
Now, as soon as we introduce the concept that selecting from a table 
might not really mean read from the table but read from the table 
after applying this owner-specified qual, we're opening up a whole 
new set of attack surfaces. Every pg_dump is an opportunity to hack 
somebody else's account, or at least audit their activity. 


I'm in full agreement we should clearly communicate the issues around 
pg_dump in particular, because they can't necessarily be eliminated 
altogether without some major work that's going to take a while to 
finish.  And if the work-around is some sort of GUC for killing RLS 
altogether, that's ugly but not unacceptable to me as a short-term fix.


One of the difficult design requests in my inbox right now asks how 
pg_dump might be changed both to reduce its overlap with superuser 
permissions and to allow auditing of its activity.  Those requests 
aren't going away; their incoming frequency is actually rising quite 
fast right now.  They're both things people expect from serious SQL 
oriented commercial database products, and I'd like to see PostgreSQL 
continue to displace those as we reach feature parity in those areas.


Any way you implement finer grained user permissions and auditing 
features will be considered a new attack vector when you use those 
features.  The way the proposed RLS feature inserts an arbitrary 
function for reads has a similar new attack vector when you use that 
feature.


I'm kind of surprised to see this turn into a hot button all of the 
sudden though, because my thought on all that so far has been a giant so 
what?  This is what PostgreSQL does.


You wanna write your own C code and then link the thing right into the 
server, so that bugs can expose data and crash the whole server?  Not 
only can you shoot yourself in the foot that way, we supply a sample gun 
and bullets in contrib.  How about writing arbitrary code in any one of 
a dozen server-side languages of wildly varying quality, then hooking 
that code so it runs as a trigger function whenever you change a row?  
PostgreSQL is *on it*; we love letting people write some random thing, 
and then running that random thing against your data as a side-effect of 
doing an operation.  And if you like that...just wait until you learn 
about this half-assed rules feature we have too!


And when the database breaks because the functions people inserted were 
garbage, that's their fault, not a cause for a CVE.  And when someone 
blindly installs adminpack because it sounded like a pgAdmin 
requirement, lets a monitoring system run as root so it can watch 
pg_stat_activity, and then discovers that pair of reasonable decisions 
suddenly means any fool with monitoring access can call 
pg_file_unlink...that's their fault too. These are powerful tools with 
serious implications, and they're expected to be used by equally serious 
users.


We as a development community do need to put a major amount of work into 
refactoring all of these security mechanisms.  There should be less of 
these embarrassing incidents where bad software design really forced the 
insecure thing to happen, which I'd argue is the case for that 
pg_stat_activity example.  And luckily so far development resources are 
appearing for organizations I know of working in that direction 
recently, as fast as the requirements are rising.  I think there's a 
good outcome at the end of that road.


But let's not act like RLS is a scary bogeyman because it introduces a 
new way to hack the server or get surprising side-effects.  That's 
expected and possibly unavoidable behavior in a feature like this, and 
there are much worse instances of arbitrary function risk throughout the 
core code already.


--
Greg Smith greg.sm...@crunchydatasolutions.com
Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to change the pgsql source code and build it??

2014-06-12 Thread Shreesha

Hello,
I need to initialize the db as the root and start the database server. In
order to accomplish this, I modified the initdb.c source file of pgsql
package and tried to compile it. Eventhough the build was successful, I
couldn't see the root user able to execute initdb executable generated by
the build. I wanted to know if there is any other procedure for building
the postgresql procedure?

Thanks
Shreesha.

P.S
Below is the changes done in initdb.c (shown in bold letters below)
---
static char *
get_id(void)
{
#ifndef WIN32

struct passwd *pw;

//  if (geteuid() == 0) /* 0 is root's uid */
*/*  {*
*fprintf(stderr,*
*_(%s: cannot be run as root\n*
*  Please log in (using, e.g., \su\) as
the *
*  (unprivileged) user that will\n*
*  own the server process.\n),*
*progname);*
*exit(1);*
*}*
**/*
...
}

Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-12 Thread Stephen Frost

Greg, all,

I will reply to the emails in detail when I get a chance but am out of town
at a funeral, so it'll likely be delayed. I did want to echo my agreement
for the most part with Greg and in particular...

On Thursday, June 12, 2014, Gregory Smith gregsmithpg...@gmail.com wrote:

 On 6/11/14, 10:26 AM, Robert Haas wrote:

 Now, as soon as we introduce the concept that selecting from a table
 might not really mean read from the table but read from the table after
 applying this owner-specified qual, we're opening up a whole new set of
 attack surfaces. Every pg_dump is an opportunity to hack somebody else's
 account, or at least audit their activity.


 I'm in full agreement we should clearly communicate the issues around
 pg_dump in particular, because they can't necessarily be eliminated
 altogether without some major work that's going to take a while to finish.
  And if the work-around is some sort of GUC for killing RLS altogether,
 that's ugly but not unacceptable to me as a short-term fix.


A GUC which is enable / disable / error-instead may work quiet well, with
error-instead for pg_dump default if people really want it (there would
have to be a way to disable that though, imv).

Note that enable is default in general, disable would be for superuser only
(or on start-up) to disable everything, and error-instead anyone could use
but it would error instead of implementing RLS when querying an RLS-enabled
table.

This approach was suggested by an existing user testing out this RLS
approach, to be fair, but it looks pretty sane to me as a way to address
some of these concerns. Certainly open to other ideas and thoughts though.

Thanks,

Stephen

Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-12 Thread Abhijit Menon-Sen

At 2014-06-12 16:08:05 -0700, shreesha1...@gmail.com wrote:

 I need to initialize the db as the root and start the database server.

Why?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-12 Thread Shreesha

I need to port pgsql onto a controller which doesn't have a framework of
creating multiple users for administrative purposes. The entire controller
is managed by a single root user and that is the reason I am trying to
change the pgsql initdb behavior. Do you think of any other better
alternative?



On Thu, Jun 12, 2014 at 5:40 PM, Abhijit Menon-Sen a...@2ndquadrant.com
wrote:

 At 2014-06-12 16:08:05 -0700, shreesha1...@gmail.com wrote:
 
  I need to initialize the db as the root and start the database server.

 Why?

 -- Abhijit




-- 
~Shreesha.

Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-12 Thread Kyotaro HORIGUCHI

Hi,

 I need to port pgsql onto a controller which doesn't have a framework of
 creating multiple users for administrative purposes. The entire controller
 is managed by a single root user and that is the reason I am trying to
 change the pgsql initdb behavior. Do you think of any other better
 alternative?

The reason you didn't see initdb completed is that it execs
postgres on the way.

As you know, it is strongly discourged on ordinary environment,
but that framework sounds to be a single-user environment like
what MS-DOS was, where any security risk comes from the
characterisc is acceptable.

I could see initdb and postgres operating as root for the moment
(which means any possible side-effect is not checked) by making
changes at four point in the whole postgresql source
tree. Perhaps only two of them are needed for your wish.

postgresql $ find . -type f -print | xargs grep -nH 'geteuid() == 0'
./src/backend/main/main.c:377:  if (geteuid() == 0)
./src/bin/pg_ctl/pg_ctl.c:2121: if (geteuid() == 0)
./src/bin/initdb/initdb.c:778:  if (geteuid() == 0)  /* 0 
is root's uid */
./src/bin/pg_resetxlog/pg_resetxlog.c:250:  if (geteuid() == 0)

Try replacing these conditions with (0  geteuid() == 0) and
you would see it run as root.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure

On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-06-12 00:38:36 -0400, Noah Misch wrote:
  http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07
 
  Hm. My guess it's that it's just a 'harmless' concurrency issue. The
  test currently run in concurrency with others: I think what happens is
  that the table gets dropped in the other relation after the query has
  acquired the mvcc snapshot (used for the pg_class) test.
  But why is it triggering on such a 'unusual' system and not on others?
  That's what worries me a bit.

I can reproduce a similar disturbance in the test query using gdb and a
concurrent table drop, and the table reported in the prairiedog failure is a
table dropped in a concurrent test group.  That explanation may not be the
full story behind these particular failures, but it certainly could cause
similar failures in the future.

Let's prevent this by only reporting rows for relations that still exist after
the query is complete.

 prairiedog is pretty damn slow by modern standards.  OTOH, I think it
 is not the slowest machine in the buildfarm; hamster for instance seems
 to be at least a factor of 2 slower.  So I'm not sure whether to believe
 it's just a timing issue.

That kernel's process scheduler could be a factor.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
diff --git a/src/test/regress/expected/alter_table.out 
b/src/test/regress/expected/alter_table.out
index a182176..a274d82 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -2375,14 +2375,18 @@ Check constraints:
 
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
--- Check that we map relation oids to filenodes and back correctly.
--- Only display bad mappings so the test output doesn't change all the
--- time.
+-- Check that we map relation oids to filenodes and back correctly.  Only
+-- display bad mappings so the test output doesn't change all the time.  A
+-- filenode function call can return NULL for a relation dropped concurrently
+-- with the call's surrounding query, so check mappings only for relations
+-- that still exist after all calls finish.
+CREATE TEMP TABLE filenode_mapping AS
 SELECT
 oid, mapped_oid, reltablespace, relfilenode, relname
 FROM pg_class,
 pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS 
mapped_oid
 WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid;
+SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid;
  oid | mapped_oid | reltablespace | relfilenode | relname 
 -++---+-+-
 (0 rows)
diff --git a/src/test/regress/sql/alter_table.sql 
b/src/test/regress/sql/alter_table.sql
index 3f641f9..19e1229 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -1582,15 +1582,20 @@ ALTER TABLE IF EXISTS tt8 SET SCHEMA alter2;
 DROP TABLE alter2.tt8;
 DROP SCHEMA alter2;
 
--- Check that we map relation oids to filenodes and back correctly.
--- Only display bad mappings so the test output doesn't change all the
--- time.
+-- Check that we map relation oids to filenodes and back correctly.  Only
+-- display bad mappings so the test output doesn't change all the time.  A
+-- filenode function call can return NULL for a relation dropped concurrently
+-- with the call's surrounding query, so check mappings only for relations
+-- that still exist after all calls finish.
+CREATE TEMP TABLE filenode_mapping AS
 SELECT
 oid, mapped_oid, reltablespace, relfilenode, relname
 FROM pg_class,
 pg_filenode_relation(reltablespace, pg_relation_filenode(oid)) AS 
mapped_oid
 WHERE relkind IN ('r', 'i', 'S', 't', 'm') AND mapped_oid IS DISTINCT FROM oid;
 
+SELECT m.* FROM filenode_mapping m JOIN pg_class c ON c.oid = m.oid;
+
 -- Checks on creating and manipulation of user defined relations in
 -- pg_catalog.
 --

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the relfilenode mapping infrastructure

Noah Misch n...@leadboat.com writes:
On Thu, Jun 12, 2014 at 02:44:10AM -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2014-06-12 00:38:36 -0400, Noah Misch wrote:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedogdt=2014-06-12%2000%3A17%3A07

I can reproduce a similar disturbance in the test query using gdb and a
concurrent table drop, and the table reported in the prairiedog failure is a
table dropped in a concurrent test group. That explanation may not be the
full story behind these particular failures, but it certainly could cause
similar failures in the future.

Yeah, that seems like a plausible explanation, since the table shown
in the failure report is one that would be getting dropped concurrently,
and the discrepancy is that we get NULL rather than the expected value
for the pg_filenode_relation result, which is expected if the table is
already dropped when the mapping function is called.

Let's prevent this by only reporting rows for relations that still exist after
the query is complete.

I think this is a bad solution though; it risks masking actual problems.

What seems like a better fix to me is to change the test

mapped_oid IS DISTINCT FROM oid

mapped_oid oid

pg_class.oid will certainly never read as NULL, so what this will do is
allow the single case where the function returns NULL. AFAIK there is
no reason to suppose that a NULL result would mean anything except the
table's been dropped, so changing it this way will allow only that case
and not any others.

Alternatively, we could do something like you suggest but adjust the
second join so that it suppresses only rows in which mapped_oid is null
*and* there's no longer a matching OID in pg_class. That would provide
additional confidence that the null result is a valid indicator of a
just-dropped table.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lo_create(oid, bytea) breaks every extant release of libpq