date:20140505

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Heikki Linnakangas


On 05/04/2014 03:46 PM, Bruce Momjian wrote:

I have completed the initial version of the 9.4 release notes.


Thanks!


You can view them here:

http://www.postgresql.org/docs/devel/static/release-9-4.html

I will be adding additional markup in the next few days.

Feedback expected and welcomed.  I expect to be modifying this until we
release 9.4 final.  I have marked items where I need help with question
marks.


Two rather large changes to the B-tree algorithms are missing:

* Fix race condition in B-tree page deletion (commit efada2b8)

* Make the handling of interrupted B-tree page splits more robust 
(commit 40dae7ec)


These changes are not visible to users, but they are good toknow for 
beta testers.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Heikki Linnakangas


On 05/04/2014 03:46 PM, Bruce Momjian wrote:

Auto-resize the catalog cache (Heikki Linnakangas)

This reduces memory consumption for backends accessing only a few
tables, and improves performance for backend accessing many tables.


Move this to General performance section.


Improve spinlock speed on x86_64 CPUs (test on i386?) (Heikki Linnakangas)


I believe this refers to commit b03d196b. The test on i386 note can 
just be removed. Someone ought to test it on 32-bit i386 to see if the 
same change would be beneficial there, but that's a TODO item more than 
a release notes item. I doubt anyone's interested to spend time 
performance testing spinlocks on 32-bit i386, though, so I think we're 
going to just retain the current situation for the next decade.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange

2014-05-05 Thread Mark Kirkwood


On 05/05/14 15:22, Amit Kapila wrote:


Here what I could understand is that sum of cost_limit for all
autovacuum workers should never exceed the value of
autovacuum_vacuum_cost_limit which seems to be always the
case in current code but same is not true for proposed patch.



Right, but have a look at the 1st message in this thread - the current 
behavior (and to a large extent the above condition) means that setting

cost limits per table does not work in any remotely intuitive way.

ITSM that the whole purpose of a per table setting in this context is to 
override the behavior of auto vacuum throttling - and currently this 
does not happen unless you get real brutal (i.e setting the cost delay 
to zero in addition to setting cost limit...making the whole cost limit 
a bit pointless).


regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-05 Thread Heikki Linnakangas


On 05/04/2014 11:13 PM, Magnus Hagander wrote:

On Sun, May 4, 2014 at 9:35 PM, Josh Berkus j...@agliodbs.com wrote:


Architecture wise there's no coverage for:
* some ARM architecture varians


I could run a buildfarm animal on a Raspberry Pi if the Postgres
community will replace my flash cards as they burn out.


Heikki already does that - it's chipmunk.


Michael Paquier's hamster is also a Raspberry Pi.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-05 Thread Michael Paquier

On Mon, May 5, 2014 at 3:40 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 05/04/2014 11:13 PM, Magnus Hagander wrote:

 On Sun, May 4, 2014 at 9:35 PM, Josh Berkus j...@agliodbs.com wrote:

 Architecture wise there's no coverage for:
 * some ARM architecture varians


 I could run a buildfarm animal on a Raspberry Pi if the Postgres
 community will replace my flash cards as they burn out.


 Heikki already does that - it's chipmunk.


 Michael Paquier's hamster is also a Raspberry Pi.
Yep, using Archlinux for the PI, Heikki's stuff plays with Raspbian.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] using array of char pointers gives wrong results

2014-05-05 Thread Ashutosh Bapat

Let me bring the bug fix again to the surface. Is anybody looking at this
fix?


On Tue, Apr 29, 2014 at 2:25 PM, Ashutosh Bapat 
ashutosh.ba...@enterprisedb.com wrote:

 Hi,
 When array of char * is used as target for the FETCH statement returning
 more than one row, it tries to store all the result in the first element.
 PFA test_char_select.pgc, which fetches first 3 relnames from pg_class
 ordered by relname. The program prints following result

 steps to compile and build the program
 ecpg -c -Iecpg_include_dir test_char_select.pgc
 cc -Ipg installation include dir -g   -c -o test_char_select.o
 test_char_select.c
 cc -g  test_char_select.o  -Lpg installation lib dir -lecpg -lpq
 -lpgtypes -o test_char_select

 output
 ./test_char_select
 relname=___pg_foreign_table_columns
 relname=
 relname=

 The first three relnames should have been
 postgres=# select relname from pg_class order by relname limit 3;
   relname
 ---
  _pg_foreign_data_wrappers
  _pg_foreign_servers
  _pg_foreign_table_columns

 It's obvious that the first element of the array is being overwritten with
 an offset of 1.

 This happens because, the array of char pointer is dumped as
  /* Fetch multiple columns into one structure. */
 { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, fetch 3 from cur1,
 ECPGt_EOIT,
 ECPGt_char,(strings),(long)0,(long)3,*(1)*sizeof(char)*,
 ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EORT);

 Since the offset is 1, the next result overwrites the previous result
 except for the first byte.

 PFA patch ecpg_char_ptr_arr.patch to fix this issue. It has changes as
 follows
 1. Dump array of char pointer with right offset i.e. sizeof(char *)
 2. While reading array of char pointer in ecpg_do_prologue(), use the
 address instead of the value at that address
 3. The pointer arithmetic should treat such variable as char **, instead
 of char *

 ECPG regression tests do not show any failures with this patch.
 --
 Best Wishes,
 Ashutosh Bapat
 EnterpriseDB Corporation
 The Postgres Database Company




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-05 Thread Andres Freund

Hi,

On 2014-05-04 12:35:44 -0700, Josh Berkus wrote:
  There's pretty little coverage of non mainstream platforms/compilers in
  the buildfarm atm. Maybe we should send an email on -announce asking for
  new ones?
  There's no coverage for OS-wise;
  * AIX (at all)
  * HP-UX (for master at least)
  (* Tru64)
  (* UnixWare)
 
 Do we want a SmartOS (opensolaris/Joyent) animal?  Do we already have one?

I don't think so. We only have solaris 10 afaics.

  * mips
  * s390/x
  * sparc 32bit
 
 Do we really care about sparc 32bit at this point?  You're talking a
 10-year-old machine, there.

I personally don't really, but the last time it came up significant
parts of community opinionated the other way. And I'd rather have it
tested and actually supported than supposedly supported.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

Hi

When running more than one cluster I often find myself looking at
the output of 'iotop' or other tools wondering which
cluster's wal receiver process or checkpointer process etc
I'm seeing.  Obviously it's easy enough to find out (for example
by looking at a tree view in htop/ps that shows the -D parameter
of the parent postmaster), but I thought it could be useful to be
able to give a name to the cluster and include it in the ps
output.  Here is a strawman patch that does that.

If cluster_name is not set, it defaults to the empty string and
the ps output is unchanged.  If it's set to 'foox' the ps output
includes that string in square brackets:

  postgres: [foox] checkpointer process
  postgres: [foox] writer process
  postgres: [foox] wal writer process
  postgres: [foox] autovacuum launcher process
  postgres: [foox] stats collector process
  postgres: [foox] munro foodb [local] idle

I would be grateful for any feedback.  Thanks,

Thomas
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 15020c4..7f7fd52 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -449,6 +449,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+char	   *cluster_name;
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS,
+			gettext_noop(Sets the name of the cluster that appears in 'ps' output.),
+			NULL,
+			GUC_IS_NAME
+		},
+		cluster_name,
+		,
+		NULL, NULL, NULL
+	},
+
+	{
 		{data_directory, PGC_POSTMASTER, FILE_LOCATIONS,
 			gettext_noop(Sets the server's data directory.),
 			NULL,
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 6294ca3..ead7ea4 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -29,6 +29,7 @@
 #include libpq/libpq.h
 #include miscadmin.h
 #include utils/ps_status.h
+#include utils/guc.h
 
 extern char **environ;
 bool		update_process_title = true;
@@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname,
 	 * apparently setproctitle() already adds a `progname:' prefix to the ps
 	 * line
 	 */
-	snprintf(ps_buffer, ps_buffer_size,
-			 %s %s %s ,
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX 
 #else
-	snprintf(ps_buffer, ps_buffer_size,
-			 postgres: %s %s %s ,
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX postgres: 
 #endif
 
+	if (*cluster_name == '\0')
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX %s %s %s ,
+ username, dbname, host_info);
+	}
+	else
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX [%s] %s %s %s ,
+ cluster_name, username, dbname, host_info);
+	}
+
 	ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);
 
 	set_ps_display(initial_str, true);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index be68f35..639288a 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -223,6 +223,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
+extern char *cluster_name;
 extern char *data_directory;
 extern char *ConfigFileName;
 extern char *HbaFileName;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Andres Freund

Hi,

While investigating an issue pointed out by valgrind around undefined
bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
bug in ReceiveSharedInvalidMessages(). It tries to be safe against
recursion but it's not:
When it recurses into ReceiveSharedInvalidMessages() from it's main loop
from inside the inval callback while nextmsg = nummsgs it'll overwrite
the 'messages' array with new contents. But at this point the old
content of one entry in the messages array is still passed to
the LocalExecuteInvalidationMessage() that caused the recursion.

It looks to me like SHAREDINVALRELCACHE_ID messages are the only ones
actually affected by this in reality because all the other ones either
operate on stack memory or don't recurse. What could happen there is
that a different relcache entry is invalidated than the relcache
invalidation callback is called for.
This actually could explain the relfilenodemap error we've seen a couple
weeks back.

It looks to me like this is broken since at least fad153ec. I think the
fix is just to make the current 'SharedInvalidationMessage *msg' not be
pointers but a local copiy of the to-be-processed entry.

Comments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

Hi,

On 2014-05-05 10:00:34 +, Thomas Munro wrote:
 When running more than one cluster I often find myself looking at
 the output of 'iotop' or other tools wondering which
 cluster's wal receiver process or checkpointer process etc
 I'm seeing.

I wonder about that pretty regularly. To the point that I've a hacky
version of this locally. So +1 for me for the idea in general.

 If cluster_name is not set, it defaults to the empty string and
 the ps output is unchanged.  If it's set to 'foox' the ps output
 includes that string in square brackets:
 
   postgres: [foox] checkpointer process
   postgres: [foox] writer process
   postgres: [foox] wal writer process
   postgres: [foox] autovacuum launcher process
   postgres: [foox] stats collector process
   postgres: [foox] munro foodb [local] idle

postgres: [foox] ... should rather be postgres[foox]: ... imo ;)

I guess the question is where this should be available as well. At the
very least I'd want to reference it in log_line_prefix as well?

 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
 index 15020c4..7f7fd52 100644
 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -449,6 +449,7 @@ int   temp_file_limit = -1;
  
  int  num_temp_buffers = 1024;
  
 +char*cluster_name;
  char*data_directory;
  char*ConfigFileName;
  char*HbaFileName;
 @@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
   },
  
   {
 + {cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS,
 + gettext_noop(Sets the name of the cluster that appears 
 in 'ps' output.),
 + NULL,
 + GUC_IS_NAME
 + },
 + cluster_name,
 + ,
 + NULL, NULL, NULL
 + },
 +
 + {
   {data_directory, PGC_POSTMASTER, FILE_LOCATIONS,
   gettext_noop(Sets the server's data directory.),
   NULL,
 diff --git a/src/backend/utils/misc/ps_status.c 
 b/src/backend/utils/misc/ps_status.c
 index 6294ca3..ead7ea4 100644
 --- a/src/backend/utils/misc/ps_status.c
 +++ b/src/backend/utils/misc/ps_status.c
 @@ -29,6 +29,7 @@
  #include libpq/libpq.h
  #include miscadmin.h
  #include utils/ps_status.h
 +#include utils/guc.h
  
  extern char **environ;
  bool update_process_title = true;
 @@ -264,15 +265,24 @@ init_ps_display(const char *username, const char 
 *dbname,
* apparently setproctitle() already adds a `progname:' prefix to the ps
* line
*/
 - snprintf(ps_buffer, ps_buffer_size,
 -  %s %s %s ,
 -  username, dbname, host_info);
 +#define PROGRAM_NAME_PREFIX 
  #else
 - snprintf(ps_buffer, ps_buffer_size,
 -  postgres: %s %s %s ,
 -  username, dbname, host_info);
 +#define PROGRAM_NAME_PREFIX postgres: 
  #endif
  
 + if (*cluster_name == '\0')
 + {
 + snprintf(ps_buffer, ps_buffer_size,
 +  PROGRAM_NAME_PREFIX %s %s %s ,
 +  username, dbname, host_info);
 + }
 + else
 + {
 + snprintf(ps_buffer, ps_buffer_size,
 +  PROGRAM_NAME_PREFIX [%s] %s %s %s ,
 +  cluster_name, username, dbname, host_info);
 + }
 +
   ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);

Aren't you potentially dereferencing a NULL pointer here?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)

2014-05-05 Thread Heikki Linnakangas


On 04/14/2014 09:48 PM, Heikki Linnakangas wrote:

On 04/14/2014 07:51 PM, Tom Lane wrote:

I'd prefer to leave the prepare sequence alone and instead find a way
to reject COMMIT PREPARED until after the source transaction is safely
clear of the race conditions.  The upthread idea of looking at vxid
instead of xid might help, except that I see we clear both of them
in ProcArrayClearTransaction.  We'd need some state in PGPROC that
isn't cleared till later than that.


Hmm. What if one of the post-cleanup action fails? We can't bail out of
the prepare sequence until we have transfered the locks to the new
PGPROC. Otherwise the locks are lost. In essence, there should be a
critical section from the EndPrepare call until all the critical cleanup
actions like PostPrepare_Locks have been done, and I don't think we want
that. We might be able to guarantee that the built-in post-cleanup
operations are safe enough for that, but there's also CallXactCallbacks
in there.

Given the lack of reports of that happening, though, perhaps that's not
an issue.


I came up with the attached fix for this. Currently, the entry is 
implicitly considered dead or unlocked if the locking_xid transaction is 
no longer active, but this patch essentially turns locking_xid into a 
simple boolean, and makes it the backend's responsibility to clear it on 
abort. (it's not actually a boolean, it's a BackendId, but that's just 
for debugging purposes to track who's keeping the entry locked). This 
requires a process exit hook, and an abort hook, to make sure the entry 
is always released, but that's not too difficult. It allows the backend 
to release the entry at exactly the right time, instead of having it 
implicitly released by ProcArrayClearTransaction.


If we error during prepare, after having written the prepare WAL record 
but before the locks have been transfered to the dummy PGPROC, the locks 
are simply released. This is wrong, but it's always been like that and 
we haven't heard any complaints of that from the field, so I'm inclined 
to leave it as it is. We could use a critical section to force a panic, 
but that cure could be a worse than the disease.


I considered Andres' idea of using a new heavy-weight lock, but didn't 
like it much. It would be a larger patch, which is not nice for 
back-patching. One issue would be that if you run out of lock memory, 
you could not roll back any prepared transactions, which is not nice 
because it could be a prepared transaction that's hoarding the lock memory.


- Heikki

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 66dbf58..995f51f 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -58,6 +58,7 @@
 #include replication/walsender.h
 #include replication/syncrep.h
 #include storage/fd.h
+#include storage/ipc.h
 #include storage/predicate.h
 #include storage/proc.h
 #include storage/procarray.h
@@ -82,25 +83,25 @@ int			max_prepared_xacts = 0;
  *
  * The lifecycle of a global transaction is:
  *
- * 1. After checking that the requested GID is not in use, set up an
- * entry in the TwoPhaseState-prepXacts array with the correct XID and GID,
- * with locking_xid = my own XID and valid = false.
+ * 1. After checking that the requested GID is not in use, set up an entry in
+ * the TwoPhaseState-prepXacts array with the correct GID and valid = false,
+ * and mark it as locked by my backend.
  *
  * 2. After successfully completing prepare, set valid = true and enter the
  * referenced PGPROC into the global ProcArray.
  *
- * 3. To begin COMMIT PREPARED or ROLLBACK PREPARED, check that the entry
- * is valid and its locking_xid is no longer active, then store my current
- * XID into locking_xid.  This prevents concurrent attempts to commit or
- * rollback the same prepared xact.
+ * 3. To begin COMMIT PREPARED or ROLLBACK PREPARED, check that the entry is
+ * valid and not locked, then mark the entry as locked by storing my current
+ * backend ID into locking_backend.  This prevents concurrent attempts to
+ * commit or rollback the same prepared xact.
  *
  * 4. On completion of COMMIT PREPARED or ROLLBACK PREPARED, remove the entry
  * from the ProcArray and the TwoPhaseState-prepXacts array and return it to
  * the freelist.
  *
  * Note that if the preparing transaction fails between steps 1 and 2, the
- * entry will remain in prepXacts until recycled.  We can detect recyclable
- * entries by checking for valid = false and locking_xid no longer active.
+ * entry must be removed so that the GID and the GlobalTransaction struct
+ * can be reused.  See AtAbort_Twophase().
  *
  * typedef struct GlobalTransactionData *GlobalTransaction appears in
  * twophase.h
@@ -115,8 +116,8 @@ typedef struct GlobalTransactionData
 	TimestampTz prepared_at;	/* time of preparation */
 	XLogRecPtr	prepare_lsn;	/* XLOG offset of prepare record */
 	Oid			owner;			/* ID of user that executed the xact */
-

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote:

 Hi,

 On 2014-05-05 10:00:34 +, Thomas Munro wrote:
  When running more than one cluster I often find myself looking at
  the output of 'iotop' or other tools wondering which
  cluster's wal receiver process or checkpointer process etc
  I'm seeing.

 I wonder about that pretty regularly. To the point that I've a hacky
 version of this locally. So +1 for me for the idea in general.


Thanks!  (Do you have a write up/diff somewhere showing the local
modifications you're running?)


  If cluster_name is not set, it defaults to the empty string and
  the ps output is unchanged.  If it's set to 'foox' the ps output
  includes that string in square brackets:
 
postgres: [foox] checkpointer process
postgres: [foox] writer process
postgres: [foox] wal writer process
postgres: [foox] autovacuum launcher process
postgres: [foox] stats collector process
postgres: [foox] munro foodb [local] idle

 postgres: [foox] ... should rather be postgres[foox]: ... imo ;)


Hah -- I agree, but on systems using setproctitle, the program name and :
 are provided already, so the end result would have to be different on
those systems and I figured it should be the same everywhere if possible.
 (BTW I also tried to tidy up the way that is handled, so that instead of a
different snprintf statement being selected by the preprocessor, a macro
PROGRAM_NAME_PREFIX is defined to be empty on those systems).


 I guess the question is where this should be available as well. At the
 very least I'd want to reference it in log_line_prefix as well?


Good idea, I will look into that.


  + if (*cluster_name == '\0')
  + {
  + snprintf(ps_buffer, ps_buffer_size,
  +  PROGRAM_NAME_PREFIX %s %s %s ,
  +  username, dbname, host_info);
  + }
  + else
  + {
  + snprintf(ps_buffer, ps_buffer_size,
  +  PROGRAM_NAME_PREFIX [%s] %s %s %s ,
  +  cluster_name, username, dbname,
 host_info);
  + }
  +
ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);

 Aren't you potentially dereferencing a NULL pointer here?


Hmm -- I thought the GUC machinery would make sure cluster_name either
pointed to the default I provided, an empty string, or a string read from
the configuration file.  Perhaps I need to go and read up on how GUCs work.

Thomas Munro

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Craig Ringer

On 05/05/2014 06:00 PM, Thomas Munro wrote:
 Hi
 
 When running more than one cluster I often find myself looking at
 the output of 'iotop' or other tools wondering which
 cluster's wal receiver process or checkpointer process etc
 I'm seeing.  Obviously it's easy enough to find out (for example
 by looking at a tree view in htop/ps that shows the -D parameter
 of the parent postmaster), but I thought it could be useful to be
 able to give a name to the cluster and include it in the ps
 output.  Here is a strawman patch that does that.
 
 If cluster_name is not set, it defaults to the empty string and
 the ps output is unchanged.  If it's set to 'foox' the ps output
 includes that string in square brackets:
 
   postgres: [foox] checkpointer process
   postgres: [foox] writer process
   postgres: [foox] wal writer process
   postgres: [foox] autovacuum launcher process
   postgres: [foox] stats collector process
   postgres: [foox] munro foodb [local] idle

I'd find something like that very useful.

I'd even be tempted to tack on the port number, though that might be
overkill.

Perhaps just use the port number instead of a new cluster name? Most
people won't use/set something like a cluster name, though I guess
distros that use pg_wrapper would probably auto-set it via pg_wrapper's
configuration.

Personally I'd find:

  postgres[5433]: checkpointer process

at least as useful. The only time that's not unique is in a BSD jail or
lxc container, and in those cases IIRC ps can show you the
jail/container anyway.

Showing the port would help new-ish users a lot; many seem to be very
confused by which PostgreSQL instance(s) they're connecting to and which
are running. Especially on Mac OS X, where people often land up with
Apple's PostgreSQL, Homebrew, Postgres.app, and who knows what else
running at the same time.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 10:49:19 +, Thomas Munro wrote:
 On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote:
 
  Hi,
 
  On 2014-05-05 10:00:34 +, Thomas Munro wrote:
   When running more than one cluster I often find myself looking at
   the output of 'iotop' or other tools wondering which
   cluster's wal receiver process or checkpointer process etc
   I'm seeing.
 
  I wonder about that pretty regularly. To the point that I've a hacky
  version of this locally. So +1 for me for the idea in general.
 
 
 Thanks!  (Do you have a write up/diff somewhere showing the local
 modifications you're running?)

I've just hacked in the port into the ps display since that was
sufficient for my case.

   If cluster_name is not set, it defaults to the empty string and
   the ps output is unchanged.  If it's set to 'foox' the ps output
   includes that string in square brackets:
  
 postgres: [foox] checkpointer process
 postgres: [foox] writer process
 postgres: [foox] wal writer process
 postgres: [foox] autovacuum launcher process
 postgres: [foox] stats collector process
 postgres: [foox] munro foodb [local] idle
 
  postgres: [foox] ... should rather be postgres[foox]: ... imo ;)
 
 
 Hah -- I agree, but on systems using setproctitle, the program name and :
  are provided already, so the end result would have to be different on
 those systems and I figured it should be the same everywhere if possible.

Fair point.

  Aren't you potentially dereferencing a NULL pointer here?
 
 
 Hmm -- I thought the GUC machinery would make sure cluster_name either
 pointed to the default I provided, an empty string, or a string read from
 the configuration file.  Perhaps I need to go and read up on how GUCs work.

That's true - but I am not sure you can guarantee it's only called after
the GUC machinery has started up. Particularly on windows. I guess just
initializing the global variable to  should do the trick.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-05-05 10:00:34 +, Thomas Munro wrote:
  When running more than one cluster I often find myself looking at
  the output of 'iotop' or other tools wondering which
  cluster's wal receiver process or checkpointer process etc
  I'm seeing.
 
 I wonder about that pretty regularly. To the point that I've a hacky
 version of this locally. So +1 for me for the idea in general.

Ditto.

  If cluster_name is not set, it defaults to the empty string and
  the ps output is unchanged.  If it's set to 'foox' the ps output
  includes that string in square brackets:
  
postgres: [foox] checkpointer process
postgres: [foox] writer process
postgres: [foox] wal writer process
postgres: [foox] autovacuum launcher process
postgres: [foox] stats collector process
postgres: [foox] munro foodb [local] idle
 
 postgres: [foox] ... should rather be postgres[foox]: ... imo ;)
 
 I guess the question is where this should be available as well. At the
 very least I'd want to reference it in log_line_prefix as well?

I'm not entirely sure that I see the point of having it in
log_line_prefix- each cluster logs to its own log file which includes
the cluster name (at least on Debian/Ubuntu and friends).  The only use
case I can imagine here would be for syslog, but you could just *set*
the cluster name in the log_line_prefix, as it'd be (by definition..)
configurable per cluster.

I'd much rather see other things added as log_line_prefix options..  An
interesting thought that just occured to me would be to allow any GUC to
be added to log_line_prefix using some kind of extended % support (eg:
'%{my_guc_here}' or something...).  Would also be useful for extensions
which add GUCs then?  Not sure about specifics, but does seem like an
interesting idea.

Oh, and I know people will shoot me for bringing it up again, but I'd
still like to see the CSV format be configurable ala log_line_prefix,
and the same for any kind of logging (or auditing) to a table which we
might eventually support.  Yes, we need to work out how to do file
changes when it's updated and stick a header on each new file with the
columns included.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

Craig,

* Craig Ringer (cr...@2ndquadrant.com) wrote:
   postgres[5433]: checkpointer process
 
 at least as useful. The only time that's not unique is in a BSD jail or
 lxc container, and in those cases IIRC ps can show you the
 jail/container anyway.

Uhh, that's not at all true.  You can trivially have multiple IPs on a
box w/o jails or containers (aliased interfaces) and then run PG on the
default port- which I find to be *far* more convenient than having the
same IP and a bunch of different ports.

What you *can't* have is two clusters with the same name and same major
version, at least on the Debian/Ubuntu distributions, and as such, I
would argue to also include the major version rather than include the
port, which you could get from pg_lsclusters.

 Showing the port would help new-ish users a lot; many seem to be very
 confused by which PostgreSQL instance(s) they're connecting to and which
 are running. Especially on Mac OS X, where people often land up with
 Apple's PostgreSQL, Homebrew, Postgres.app, and who knows what else
 running at the same time.

I'm far from convinced that showing the port in the ps output will
really help these users..  Not to mention that you can get that from
'netstat -anp' anyway.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 07:58:30 -0400, Stephen Frost wrote:
  I guess the question is where this should be available as well. At the
  very least I'd want to reference it in log_line_prefix as well?
 
 I'm not entirely sure that I see the point of having it in
 log_line_prefix- each cluster logs to its own log file which includes
 the cluster name (at least on Debian/Ubuntu and friends).  The only use
 case I can imagine here would be for syslog, but you could just *set*
 the cluster name in the log_line_prefix, as it'd be (by definition..)
 configurable per cluster.

So I've to configure it in multiple locations? I don't see the point. I
usually try to configure as much in common/template config files that
are included. Everything that doesn't have to be overwritten is good.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-05-05 07:58:30 -0400, Stephen Frost wrote:
   I guess the question is where this should be available as well. At the
   very least I'd want to reference it in log_line_prefix as well?
  
  I'm not entirely sure that I see the point of having it in
  log_line_prefix- each cluster logs to its own log file which includes
  the cluster name (at least on Debian/Ubuntu and friends).  The only use
  case I can imagine here would be for syslog, but you could just *set*
  the cluster name in the log_line_prefix, as it'd be (by definition..)
  configurable per cluster.
 
 So I've to configure it in multiple locations? I don't see the point. I
 usually try to configure as much in common/template config files that
 are included. Everything that doesn't have to be overwritten is good.

I see the point- we've already got quite a few %whatever's and adding
mostly useless ones may just add confusion and unnecessary complexity.
If you've already got your postgresql.conf templated through
puppet/chef/salt/whatever then having to add another '%{ CLUSTER_NAME }%'
or whatever shouldn't be a terribly difficult thing.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

On 5 May 2014 10:49, Thomas Munro mu...@ip9.org wrote:

 On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote:

 I guess the question is where this should be available as well. At the
 very least I'd want to reference it in log_line_prefix as well?


 Good idea, I will look into that.


See v2 patch attached which lets you use %C for cluster name in the log
prefix.

Maybe it would be overkill, but seeing the various responses about which
information belongs in the ps string, I guess we could also use a format
string with %blah fields for that.  Then the Debian/Ubuntu package users
who tend to think of the major version + name as the complete cluster
identifier could use [%V/%C] ... to get postgres: [9.3/main] ..., and
others could throw in a %P to see a port number.
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 977fc66..0adfee7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, %lx.%x, (long) (MyStartTime), MyProcPid);
 break;
+			case 'C':
+if (padding != 0)
+	appendStringInfo(buf, %*s, padding, cluster_name);
+else
+	appendStringInfoString(buf, cluster_name);
+break;
 			case 'p':
 if (padding != 0)
 	appendStringInfo(buf, %*d, padding, MyProcPid);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 15020c4..7f7fd52 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -449,6 +449,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+char	   *cluster_name;
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS,
+			gettext_noop(Sets the name of the cluster that appears in 'ps' output.),
+			NULL,
+			GUC_IS_NAME
+		},
+		cluster_name,
+		,
+		NULL, NULL, NULL
+	},
+
+	{
 		{data_directory, PGC_POSTMASTER, FILE_LOCATIONS,
 			gettext_noop(Sets the server's data directory.),
 			NULL,
diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c
index 6294ca3..ead7ea4 100644
--- a/src/backend/utils/misc/ps_status.c
+++ b/src/backend/utils/misc/ps_status.c
@@ -29,6 +29,7 @@
 #include libpq/libpq.h
 #include miscadmin.h
 #include utils/ps_status.h
+#include utils/guc.h
 
 extern char **environ;
 bool		update_process_title = true;
@@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname,
 	 * apparently setproctitle() already adds a `progname:' prefix to the ps
 	 * line
 	 */
-	snprintf(ps_buffer, ps_buffer_size,
-			 %s %s %s ,
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX 
 #else
-	snprintf(ps_buffer, ps_buffer_size,
-			 postgres: %s %s %s ,
-			 username, dbname, host_info);
+#define PROGRAM_NAME_PREFIX postgres: 
 #endif
 
+	if (*cluster_name == '\0')
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX %s %s %s ,
+ username, dbname, host_info);
+	}
+	else
+	{
+		snprintf(ps_buffer, ps_buffer_size,
+ PROGRAM_NAME_PREFIX [%s] %s %s %s ,
+ cluster_name, username, dbname, host_info);
+	}
+
 	ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer);
 
 	set_ps_display(initial_str, true);
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index be68f35..639288a 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -223,6 +223,7 @@ extern int	temp_file_limit;
 
 extern int	num_temp_buffers;
 
+extern char *cluster_name;
 extern char *data_directory;
 extern char *ConfigFileName;
 extern char *HbaFileName;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix initdb for path with whitespace and at char

2014-05-05 Thread Heikki Linnakangas


On 05/01/2014 07:55 AM, Amit Kapila wrote:

On Wed, Apr 30, 2014 at 4:01 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I committed the non-invasive fixes to backbranches (and master too, just to
keep it in sync), but the attached is what I came up with for master.

There are a couple of places in the code where we have #ifdef WIN32 code
that uses CreateProcess with CMD /C ... directly.


1. Do we need similar handling for CreatePipe case where it directly uses
executable path such as in function pipe_read_line()?
Currently the caller of pipe_read_line() calls canonicalize_path() to change
win32 specific path, is that sufficient or do we need SYSTEMQUOTE type
of handling for it.


No, SYSTEMQUOTE style handling is not required with CreateProcess. 
find_other_exec, which is the only caller of pipe_read_line, adds one 
pair of double-quotes around the executable's path, which is enough for 
CreateProcess.



2.
system.c
#include assert.h
Do we really need inclusion of assert.h or this is for future use?


You're right, that's not needed.


3. One more observation is that currently install.bat doesn't work
for such paths:
install.bat e:\PostgreSQL\master\install 1\ins@1
1\ins@1== was unexpected at this time.


Yeah, I noticed that. I haven't looked into what it would take to fix 
that, but for now you can just install to a path that doesn't contain 
whitespace, and move it from there. install.bat is only used by 
developers / packagers, so that's not a big deal.



4. Similar to Andrew, I also could not reproduce this problem on my
Windows system (Windows 7 64 bit)
e:\e:\PostgreSQL\master\install 1\ins@1\bin\initdb.exe -D e:
\PostgreSQL\master\Data 1
e:\e:\PostgreSQL\master\install 1\ins@1\bin\pg_ctl.exe start -D e:
\PostgreSQL\master\Data 1

Above commands work fine.


Hmm, I'm also testing on 64-bit Windows 7, and it failed for me. Note 
that I already committed the narrow fix for initdb - which I also 
backpatched - to master, so to reproduce you'll need to revert that 
locally (commit 503de546).


I fixed the issues with malloc that Tom pointed out and committed the 
wrapper functions to git master. But please let me know if there's still 
a problem with it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Cascading replication and archive_command

2014-05-05 Thread Michael Renner

Hi,

apparently a few users were puzzled that archive_command is ignored on slave 
servers, which comes as a surprise since streaming replication will work fine 
from slaves and as far as I’ve checked the documentation also doesn’t point out 
the fact that archive_command gets a different treatment.

Is this intentional or an oversight? Should this be fixed in the code (feature 
parity to SR) or in the documentation making this more explicit?

best,
Michael

[1] 
http://serverfault.com/questions/514136/postgresql-archive-command-on-a-cascading-standby-server
 as well as personal contacts


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:
 Hi,
 
 On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote:
  Feedback expected and welcomed.  I expect to be modifying this until we
  release 9.4 final.  I have marked items where I need help with question
  marks.
 
 Thanks for doing that work. Some comments inline:

Oh, I forgot to thank committers for making this job easier every year. 
The commit titles are helpful, the descriptions good, and many items are
marked as backward incompatible where appropriate.  Git hashes make
looking up commit details easy.

 +listitem
 + para
 +  Remove system column pg_class.reltoastidxid (Michael Paquier)
 + /para
 +
 + para
 +  Instead use normal index access methods.
 + /para
 +/listitem
 
 The explanation doesn't seem helpful to me. I'd just remove it as the
 full explanation seems to be to complex and not actually interesting.

OK, description removed.  I know pg_upgrade has to be modified to handle
this.  Hopefully others will understand how to adjust things.  You are
right that the full description is complex.

 
 +   sect3
 +titleServer/title
 +
 + itemizedlist
 +
 +  listitem
 +   para
 +Have VACUUM properly report dead but not removable rows to the 
 statistics collector (Hari Babu)
 +   /para
 +
 +   para
 +Previously these were reported as live rows.
 +   /para
 +  /listitem
 +
 +  listitem
 +   para
 +Improve SSL renegotiation handling (Aacute;lvaro Herrera)
 +   /para
 +  /listitem
 +
 +  listitem
 +   para
 +During immediate shutdown, send uncatchable termination signals to 
 child processes that have not already shutdown (MauMau,
 +Aacute;lvaro Herrera)
 +   /para
 +
 +   para
 +This reduces the likelihood of orphaned child processes after 
 postmaster shutdown.
 +   /para
 +  /listitem
 +
 +  listitem
 +   para
 +Improve randomness of the database system identifier (Tom Lane)
 +   /para
 +  /listitem
 +
 +  listitem
 +   para
 +Allow background workers to be dynamically started and terminated 
 (Robert Haas)
 +   /para
 
 This is much more important than the previous features imo. Also, I'd
 add the word registered in the above list.
 
 +  listitem
 +   para
 +Allow dynamic allocation of shared memory segments (Robert Haas, 
 Amit Kapila)
 +   /para
 +  /listitem
 
 Should also be earlier in the list.

OK, added registered and I moved this item and the one above to #2 and
#3.  Let me know if that isn't good.  I was thinking the dynamic stuff
wasn't useful until we had parallelism working, but I think I was wrong.

 
 +  listitem
 +   para
 +Freeze tuples when tables are written with CLUSTER or VACUUM FULL 
 (Robert Haas, Andres Freund)
 +   /para
 +
 +   para
 +This avoids later freezing overhead.
 +   /para
 +  /listitem
 
 This sentence seems a bit awkward to me. Maybe This avoids the need to
 freeze tuples at some later point in time.

OK, I wrote, This avoids the need to freeze the tuples in the future.
 
 +  listitem
 +   para
 +Improve speed of accessing many sequence values (David Rowley)
 +   /para
 +  /listitem
 
 I think this description isn't accurate. Isn't it something like
 Improve speed of accesessing many different sequences in the same backend.

Yes, better, thanks.

 +  listitem
 +   para
 +Use memory barriers to avoid some spinlock use (Heikki Linnakangas)
 +   /para
 +  /listitem
 
 This is about 1a3d104475ce01326fc00601ed66ac4d658e37e5? If so I'd put it
 as: Reduce spinlock contention during WAL replay. or similar.

Uh, yes.  I am not sure why I fixed on the memory barrier/spinlock part.
I used your text and moved it to the replication section.

 This imo deserves to be further up this list as this is one of the
 biggest bottlenecks in HS node performance.

I didn't move it too high up as it isn't a user-visible change, but I
put it at the top of the non-visible part.  Let me know if it needs
further promotion.

 +  listitem
 +   para
 +Add huge_pages configuration parameter to attempt to use huge 
 translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
 +Richard Poole, Abhijit Menon-Sen)
 +   /para
 
 I think this is too detailed - how about: ... to use huge pages ...?

I am not sure on this as the default is try, but I updated the wording
like your suggested:

Add huge_pages configuration parameter to enable huge
translation look-aside buffer (TLB) pages on Linux

 +   para
 +This can improve performance on large memory systems.
 +   /para
 +  /listitem
 
 Should this be in the performance section?

Well, performance is listed as General Performance. We also have index
changes that help performance too but they are under Index.  I think

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 06:06:52PM +0400, Oleg Bartunov wrote:
 Bruce,
 
 you forgot Alexander Korotkov, who contributed jsonb_hash_ops opclass
 for GIN.  Something like
 Alexander Korotkov introduced an elegant jsonb_hash ops for GIN,
 which competes with MongoDB performance in contains operator.
 
 Here is a link to discussion -
 http://www.postgresql.org/message-id/53304439.8000...@dunslane.net

OK, added.  As I remember Alexander's name was not in the initial commit
but mentioned in a later one, and I somehow missed that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] folder:lk/lk date:yesterday..

2014-05-05 Thread Heikki Linnakangas


On 04/30/2014 06:39 PM, Andres Freund wrote:

Hi,

Coverity flagged a couple of issues that seem to worth addressing by
changing the code instead of ignoring them:

01) heap_xlog_update() looks to coverity as if it could trigger a NULL
 pointer dereference. That's because it thinks that oldtup.t_data is
 NULL if XLR_BKP_BLOCK(0) while reconstructing incremental
 tuples. That fortunately can't happen as incremental updates are
 only performed if the tuples are on the same page.
 Add an Assert().
02) Be a bit more consistent in what type to use for a size
 variable. Inconsequential, but more consistent.
03) Don't leak memory after connection aborts in pg_recvlogical.
04) Use a sensible parameter for memset() when randomizing memory in
 reorderbuffer. Inconsequential.

Could somebody please pick these up?


Committed, thanks.


I have to say, I am not particularly happy with the complexity of the
control flow in heap_xlog_update() :(.


Agreed :-(. It might be good to split it into two functions, one for 
same-page updates and another for others. And have two different WAL 
record structs for the cases, too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cascading replication and archive_command

2014-05-05 Thread Heikki Linnakangas


On 05/05/2014 04:19 PM, Michael Renner wrote:

Hi,

apparently a few users were puzzled that archive_command is ignored
on slave servers, which comes as a surprise since streaming
replication will work fine from slaves and as far as I’ve checked the
documentation also doesn’t point out the fact that archive_command
gets a different treatment.

Is this intentional or an oversight? Should this be fixed in the code
(feature parity to SR) or in the documentation making this more
explicit?


It was intentional, although I can certainly understand the viewpoint 
that archive_command should also archive in the standby. IIRC people 
argued it both ways when the cascading replication was discussed


The current assumption is that the archive is shared by the master and 
standby (or standbys), so that there is no point in archiving the same 
file again in the standby. On the contrary, re-archiving the same file 
would fail, so you would need to disable archiving in the standby, and 
re-enable it when promoting, which would be more complicated.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote:
 
 On 05/04/2014 10:12 AM, Petr Jelinek wrote:
 On 04/05/14 14:46, Bruce Momjian wrote:
 I have completed the initial version of the 9.4 release notes.  You can
 view them here:
 
 http://www.postgresql.org/docs/devel/static/release-9-4.html
 
 I will be adding additional markup in the next few days.
 
 Feedback expected and welcomed.  I expect to be modifying this until we
 release 9.4 final.  I have marked items where I need help with question
 marks.
 
 
 Nice work,
 
 one comment from me would be about jsonb:
 
 +   listitem
 +para
 + Add structured (non-text) data type (jsonb) for storing
 JSON data (Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and
 Andrew Dunstan)
 +/para
 +
 +para
 + This data type allows faster indexing and access to json
 keys/value pairs.
 +/para
 +   /listitem
 
 I think the explanation should be expanded to say that this data
 type also has generic indexing not just faster indexing.
 
 
 
 It's also slightly ambiguous - it's not immediately clear that the
 faster also applies to the access.

OK, how is this:

 This data type allows for faster indexing and access to json
 key/value pairs, as well as efficient indexing of all key/value
 pairs in a JSON document.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 09:10:11AM +0300, Heikki Linnakangas wrote:
 On 05/04/2014 03:46 PM, Bruce Momjian wrote:
 I have completed the initial version of the 9.4 release notes.
 
 Thanks!
 
 You can view them here:
 
  http://www.postgresql.org/docs/devel/static/release-9-4.html
 
 I will be adding additional markup in the next few days.
 
 Feedback expected and welcomed.  I expect to be modifying this until we
 release 9.4 final.  I have marked items where I need help with question
 marks.
 
 Two rather large changes to the B-tree algorithms are missing:
 
 * Fix race condition in B-tree page deletion (commit efada2b8)
 
 * Make the handling of interrupted B-tree page splits more robust
 (commit 40dae7ec)
 
 These changes are not visible to users, but they are good toknow for
 beta testers.

Thanks, added.  I though these were fixes for 9.4-only bugs, not fixes
for earlier bugs.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 09:23:15AM +0300, Heikki Linnakangas wrote:
 On 05/04/2014 03:46 PM, Bruce Momjian wrote:
 Auto-resize the catalog cache (Heikki Linnakangas)
 
 This reduces memory consumption for backends accessing only a few
 tables, and improves performance for backend accessing many tables.
 
 Move this to General performance section.

OK, moved.

 Improve spinlock speed on x86_64 CPUs (test on i386?) (Heikki Linnakangas)
 
 I believe this refers to commit b03d196b. The test on i386 note
 can just be removed. Someone ought to test it on 32-bit i386 to see
 if the same change would be beneficial there, but that's a TODO item
 more than a release notes item. I doubt anyone's interested to spend
 time performance testing spinlocks on 32-bit i386, though, so I
 think we're going to just retain the current situation for the next
 decade.

Agreed.  Text removed.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for setting search_path

2014-05-05 Thread Bernd Helmle




--On 3. Mai 2014 10:11:33 +0200 Andres Freund and...@2ndquadrant.com 
wrote:



diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
new file mode 100644
index 6d26ffc..dec3d4a
*** a/src/bin/psql/tab-complete.c
--- b/src/bin/psql/tab-complete.c
*** psql_completion(const char *text, int st
*** 3230,3235 
--- 3230,3242 

COMPLETE_WITH_LIST(my_list);
}
+   else if (pg_strcasecmp(prev2_wd, search_path) == 0)
+   {
+   COMPLETE_WITH_QUERY(Query_for_list_of_schemas
+AND nspname not 
like 'pg\\_%%' 
+AND nspname not 
like 'information_schema' 
+UNION SELECT 
'DEFAULT' );
+   }


Why should we exclude system schemata? That seems more likely to be
confusing than helpful? I can see a point in excluding another backend's
temp tables, but otherwise?


I put my hands on this a while ago, too, but had a different notion in 
mind, which schema the completion should select. I came up with the 
following:


http://git.postgresql.org/gitweb/?p=users/bernd/postgres.git;a=commitdiff;h=03fd00cd190e8b529efeec1a1bb038454fb0b05f

Just complete to a schema someone has CREATE or USAGE privs. However, the 
reason i stopped working on it was that i really want to have a completion 
to a list of schemas as well and i couldn't figure a good and easy way to 
do this atm.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-05 10:49:19 +, Thomas Munro wrote:
 Hah -- I agree, but on systems using setproctitle, the program name and :
  are provided already, so the end result would have to be different on
 those systems and I figured it should be the same everywhere if possible.

 Fair point.

How about dropping the brackets, and the cluster-name concept, and
just doing

 postgres: 5432 checkpointer process

 Aren't you potentially dereferencing a NULL pointer here?

 Hmm -- I thought the GUC machinery would make sure cluster_name either
 pointed to the default I provided, an empty string, or a string read from
 the configuration file.  Perhaps I need to go and read up on how GUCs work.

 That's true - but I am not sure you can guarantee it's only called after
 the GUC machinery has started up.

The elog code MUST be able to work before GUC initialization is done.
What do you think will happen if we fail to open postgresql.conf,
for example?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 How about dropping the brackets, and the cluster-name concept, and
 just doing
 
  postgres: 5432 checkpointer process

-1 for my part, as I'd just end up with a bunch of those and no
distinction between the various processes.  In other words, without a
cluster distinction, it's useless.

Including the value of listen_addresses along w/ the port would make it
useful.  If we really don't want the cluster-name concept (which,
personally, I like quite a bit), how about including the listen_address
value if it isn't '*'?  I could see that also helping users who
installed from a distro and got '127.0.0.1' and don't understand why
they can't connect...

Of course, these are users who can use 'ps' but not 'netstat'.  Not sure
how big that set really is.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Minor improvement to fdwhandler.sgml

2014-05-05 Thread Robert Haas

On Wed, Apr 30, 2014 at 4:10 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 (2014/04/28 23:31), Robert Haas wrote:
 On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:

 The patch attached improves docs in fdwhandler.sgml a little bit.


 When you submit a patch, it's helpful to describe what the patch
 actually does, rather than just saying it makes things better.  For
 example, I think that this patch could be described as in
 fdwhandler.sgml, mark references to scan_clauses with structfield
 tags.


 I thought so.  Sorry, my explanation wasn't enough.


 A problem with that idea is that scan_clauses is not a field in any
 struct.


 I was mistaken.  I think those should be marked with literal tags. Patch
 attached.

OK, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Stephen Frost sfr...@snowman.net writes:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 How about dropping the brackets, and the cluster-name concept, and
 just doing
 
 postgres: 5432 checkpointer process

 -1 for my part, as I'd just end up with a bunch of those and no
 distinction between the various processes.  In other words, without a
 cluster distinction, it's useless.

Yeah, after I sent that I got to the bit about running multiple
postmasters with different IP-address bindings.  I agree the port number
alone wouldn't be enough in that scenario.

 Including the value of listen_addresses along w/ the port would make it
 useful.  If we really don't want the cluster-name concept (which,
 personally, I like quite a bit), how about including the listen_address
 value if it isn't '*'?

Nah, let's do cluster name.  That way, somebody who's only got one
postmaster isn't suddenly going to see a lot of useless clutter,
ie the user gets to decide what to add to ps output.  SHOW cluster_name
might be useful at the application level as well, I suspect.

I still think the brackets are unnecessary though.

Also, -1 for adding another log_line_prefix escape.  If you're routing
multiple clusters logging to the same place (which is already a bit
unlikely IMO), you can put distinguishing strings in log_line_prefix
already.  And it's not like we've got an infinite supply of letters
for those escapes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Sun, May  4, 2014 at 08:46:07AM -0400, Bruce Momjian wrote:
 I have completed the initial version of the 9.4 release notes.  You can
 view them here:
 
   http://www.postgresql.org/docs/devel/static/release-9-4.html
 
 I will be adding additional markup in the next few days.
 
 Feedback expected and welcomed.  I expect to be modifying this until we
 release 9.4 final.  I have marked items where I need help with question
 marks.

I have updated output reflecting all hackers list feedback received so
far:

http://momjian.us/pgsql_docs/release-9-4.html

(The developer copy of the docs take a while to update.)  Now on to the
markup additions.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Alvaro Herrera

Bruce Momjian wrote:
 On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:

  +  listitem
  +   para
  +Add huge_pages configuration parameter to attempt to use huge 
  translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
  +Richard Poole, Abhijit Menon-Sen)
  +   /para
  
  I think this is too detailed - how about: ... to use huge pages ...?
 
 I am not sure on this as the default is try, but I updated the wording
 like your suggested:
 
   Add huge_pages configuration parameter to enable huge
   translation look-aside buffer (TLB) pages on Linux

Please see
http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org
about the TLB huge pages terminology.  Maybe ..enable huge MMU
pages... as suggested by Peter G. in that sub-thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Stephen Frost sfr...@snowman.net writes:
  Including the value of listen_addresses along w/ the port would make it
  useful.  If we really don't want the cluster-name concept (which,
  personally, I like quite a bit), how about including the listen_address
  value if it isn't '*'?
 
 Nah, let's do cluster name.  That way, somebody who's only got one
 postmaster isn't suddenly going to see a lot of useless clutter,
 ie the user gets to decide what to add to ps output.  SHOW cluster_name
 might be useful at the application level as well, I suspect.

Ah, yes, agreed, that could be quite useful.

 I still think the brackets are unnecessary though.

Either way is fine for me on this.

 Also, -1 for adding another log_line_prefix escape.  If you're routing
 multiple clusters logging to the same place (which is already a bit
 unlikely IMO), you can put distinguishing strings in log_line_prefix
 already.  And it's not like we've got an infinite supply of letters
 for those escapes.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:18:44AM -0400, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Sun, May  4, 2014 at 03:49:27PM +0200, Andres Freund wrote:
 
   +  listitem
   +   para
   +Add huge_pages configuration parameter to attempt to use huge 
   translation look-aside buffer (TLB) pages on Linux (Christian Kruse,
   +Richard Poole, Abhijit Menon-Sen)
   +   /para
   
   I think this is too detailed - how about: ... to use huge pages ...?
  
  I am not sure on this as the default is try, but I updated the wording
  like your suggested:
  
  Add huge_pages configuration parameter to enable huge
  translation look-aside buffer (TLB) pages on Linux
 
 Please see
 http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org
 about the TLB huge pages terminology.  Maybe ..enable huge MMU
 pages... as suggested by Peter G. in that sub-thread.

You are totally correct.  I was referencing tlb from the _old_ name for
the variable, but didn't update the description when the variable was
renamed.

New text is:

Add huge_pages configuration parameter to use huge memory pages on 
Linux (Christian Kruse,
Richard Poole, Abhijit Menon-Sen)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Merlin Moncure

On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 5/2/14, 10:10 PM, Merlin Moncure wrote:

 On Fri, May 2, 2014 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Meh.  Then you could have a query that works fine until you add a column
 to the table, and it stops working.  If nobody ever used column names
 identical to table names it'd be all right, but unfortunately people
 seem to do that a lot...


 That's already the case with select statements

 I don't think that's true if you table-qualify your column references and
 don't use SELECT *.


 and, if a user were
 concerned about that, always have the option of aliasing the table as
 nearly 100% of professional developers do:

 SELECT f FROM foo f;
 etc.


 So e.g.:

   UPDATE foo f SET f = ..;

 would resolve to the table, despite there being a column called f? That
 would break backwards compatibility.

 How about:

   UPDATE foo SET ROW(foo) = (1,2,3);

 ISTM that this could be parsed unambiguously, though it's perhaps a bit
 ugly.

Hm, that's a bit too ugly: row(foo) in this case means 'do special
behavior X' whereas in all other cases it means make an anonymous
rowtype with one attribute of type 'foo'.

How about:
UPDATE foo SET (foo).* = (1,2,3);

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Pavel Stehule

2014-05-05 17:02 GMT+02:00 Merlin Moncure mmonc...@gmail.com:

 On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja ma...@joh.to wrote:
  On 5/2/14, 10:10 PM, Merlin Moncure wrote:
 
  On Fri, May 2, 2014 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Meh.  Then you could have a query that works fine until you add a
 column
  to the table, and it stops working.  If nobody ever used column names
  identical to table names it'd be all right, but unfortunately people
  seem to do that a lot...
 
 
  That's already the case with select statements
 
  I don't think that's true if you table-qualify your column references and
  don't use SELECT *.
 
 
  and, if a user were
  concerned about that, always have the option of aliasing the table as
  nearly 100% of professional developers do:
 
  SELECT f FROM foo f;
  etc.
 
 
  So e.g.:
 
UPDATE foo f SET f = ..;
 
  would resolve to the table, despite there being a column called f? That
  would break backwards compatibility.
 
  How about:
 
UPDATE foo SET ROW(foo) = (1,2,3);
 
  ISTM that this could be parsed unambiguously, though it's perhaps a bit
  ugly.

 Hm, that's a bit too ugly: row(foo) in this case means 'do special
 behavior X' whereas in all other cases it means make an anonymous
 rowtype with one attribute of type 'foo'.

 How about:
 UPDATE foo SET (foo).* = (1,2,3);


It is looking little bit strange

I like previous proposal UPDATE foo SET foo = (1,2,3);

Pavel



 merlin


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 09:38 AM, Bruce Momjian wrote:

On Sun, May  4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote:

On 05/04/2014 10:12 AM, Petr Jelinek wrote:

On 04/05/14 14:46, Bruce Momjian wrote:

I have completed the initial version of the 9.4 release notes.  You can
view them here:

http://www.postgresql.org/docs/devel/static/release-9-4.html

I will be adding additional markup in the next few days.

Feedback expected and welcomed.  I expect to be modifying this until we
release 9.4 final.  I have marked items where I need help with question
marks.


Nice work,

one comment from me would be about jsonb:

+   listitem
+para
+ Add structured (non-text) data type (jsonb) for storing
JSON data (Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and
Andrew Dunstan)
+/para
+
+para
+ This data type allows faster indexing and access to json
keys/value pairs.
+/para
+   /listitem

I think the explanation should be expanded to say that this data
type also has generic indexing not just faster indexing.



It's also slightly ambiguous - it's not immediately clear that the
faster also applies to the access.

OK, how is this:

  This data type allows for faster indexing and access to json
  key/value pairs, as well as efficient indexing of all key/value
  pairs in a JSON document.



No, I think you're missing the point of what I'm saying, and I think the 
addition is at best misleading. I would avoid talk of key value pairs, 
anyway, that's not all that's in a json document (it might be an array, 
for example).


How about:

   This data type allows for faster access to values in the json document and 
faster and more useful indexing of json.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 11:20 AM, Pavel Stehule wrote:




How about:
UPDATE foo SET (foo).* = (1,2,3);


It is looking little bit strange

I like previous proposal UPDATE foo SET foo = (1,2,3);



What if the table has a field called foo? Won't it then be ambiguous?

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread Merlin Moncure

On Mon, May 5, 2014 at 10:32 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 05/05/2014 11:20 AM, Pavel Stehule wrote:




 How about:
 UPDATE foo SET (foo).* = (1,2,3);


 It is looking little bit strange

 I like previous proposal UPDATE foo SET foo = (1,2,3);


 What if the table has a field called foo? Won't it then be ambiguous?

See upthread: it prefers the field to the table if both are there
(exactly as SELECT does).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)

2014-05-05 Thread David G Johnston

Merlin Moncure-2 wrote
 On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja lt;

 marko@

 gt; wrote:
 On 5/2/14, 10:10 PM, Merlin Moncure wrote:

 On Fri, May 2, 2014 at 3:03 PM, Tom Lane lt;

 tgl@.pa

 gt; wrote:

 Meh.  Then you could have a query that works fine until you add a
 column
 to the table, and it stops working.  If nobody ever used column names
 identical to table names it'd be all right, but unfortunately people
 seem to do that a lot...


 That's already the case with select statements

 I don't think that's true if you table-qualify your column references and
 don't use SELECT *.


 and, if a user were
 concerned about that, always have the option of aliasing the table as
 nearly 100% of professional developers do:

 SELECT f FROM foo f;
 etc.


 So e.g.:

   UPDATE foo f SET f = ..;

 would resolve to the table, despite there being a column called f? That
 would break backwards compatibility.

 How about:

   UPDATE foo SET ROW(foo) = (1,2,3);

 ISTM that this could be parsed unambiguously, though it's perhaps a bit
 ugly.
 
 Hm, that's a bit too ugly: row(foo) in this case means 'do special
 behavior X' whereas in all other cases it means make an anonymous
 rowtype with one attribute of type 'foo'.
 
 How about:
 UPDATE foo SET (foo).* = (1,2,3);

Wouldn't

UPDATE foo SET (foo.*) = (1,2,3)

be better since it would cleanly support non-complete types like

UPDATE foo SET (foo.col1, foo.col3) = (1,3)

Though I am not that concerned about overloading the use of ROW in context
of an UPDATE.

As with normal usage of ROW why not make its presence optional - support
both syntaxes?

Keywords like USING and SET have different meanings when used in
DELETE/UPDATE so having ROW behave similarly wouldn't be that confusing -
and it does seem to have an ambiguity if you restrict this interpretation of
ROW to only the SET of the update statement.

Is there any need or requirement for (or against) interleaving normal and
row-valued, or even multiple row-valued, SET expressions?

UPDATE foo SET (foo.col1, foo.col3) = (1,3), foo.col2 = 2

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Supporting-multiple-column-assignment-in-UPDATE-9-5-project-tp5802240p5802471.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 09:52:33 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Aren't you potentially dereferencing a NULL pointer here?
 
  Hmm -- I thought the GUC machinery would make sure cluster_name either
  pointed to the default I provided, an empty string, or a string read from
  the configuration file.  Perhaps I need to go and read up on how GUCs work.
 
  That's true - but I am not sure you can guarantee it's only called after
  the GUC machinery has started up.
 
 The elog code MUST be able to work before GUC initialization is done.
 What do you think will happen if we fail to open postgresql.conf,
 for example?

But elog. shouldn't call init_ps_display(), right? Anyway, I am all for
initializing it sensibly, after all it was I that suggested doing so above...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 08:03:04 -0400, Stephen Frost wrote:
 Craig,
 
 * Craig Ringer (cr...@2ndquadrant.com) wrote:
postgres[5433]: checkpointer process
  
  at least as useful. The only time that's not unique is in a BSD jail or
  lxc container, and in those cases IIRC ps can show you the
  jail/container anyway.
 
 Uhh, that's not at all true.  You can trivially have multiple IPs on a
 box w/o jails or containers (aliased interfaces) and then run PG on the
 default port- which I find to be *far* more convenient than having the
 same IP and a bunch of different ports.

Only that you then need different socket directories. Do you really do
that regularly?

Anyway, I am happy having the cluster_name thingy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
  Including the value of listen_addresses along w/ the port would make it
  useful.  If we really don't want the cluster-name concept (which,
  personally, I like quite a bit), how about including the listen_address
  value if it isn't '*'?
 
 Nah, let's do cluster name.  That way, somebody who's only got one
 postmaster isn't suddenly going to see a lot of useless clutter,
 ie the user gets to decide what to add to ps output.  SHOW cluster_name
 might be useful at the application level as well, I suspect.

Hm. What about unifiyng this with event_source? Not sure if it's a good
idea, but it's a pretty similar thing.

 Also, -1 for adding another log_line_prefix escape.  If you're routing
 multiple clusters logging to the same place (which is already a bit
 unlikely IMO), you can put distinguishing strings in log_line_prefix
 already.  And it's not like we've got an infinite supply of letters
 for those escapes.

Using syslog and including the same config file from multiple clusters
isn't that uncommon. But I can live without it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Stephen Frost

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2014-05-05 08:03:04 -0400, Stephen Frost wrote:
  Uhh, that's not at all true.  You can trivially have multiple IPs on a
  box w/o jails or containers (aliased interfaces) and then run PG on the
  default port- which I find to be *far* more convenient than having the
  same IP and a bunch of different ports.
 
 Only that you then need different socket directories. Do you really do
 that regularly?

Yup.  I've wished for that to be the default in Debian quite a few
times, actually...  Much easier to deal with firewall rules and users
who are coming from pgAdmin and friends if they only have to deal with
understanding what hosts they need to connect to, and not worry about
the port also.

 Anyway, I am happy having the cluster_name thingy.

Agreed.

Thanks,

Stpehen


signature.asc
Description: Digital signature

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 11:28:21AM -0400, Andrew Dunstan wrote:
 No, I think you're missing the point of what I'm saying, and I think
 the addition is at best misleading. I would avoid talk of key value
 pairs, anyway, that's not all that's in a json document (it might be
 an array, for example).
 
 How about:
 
This data type allows for faster access to values in the json document and 
 faster and more useful indexing of json.

OK, I used you wording.  You are right I didn't understand it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tab completion for setting search_path

2014-05-05 Thread Jeff Janes

On Sat, May 3, 2014 at 1:11 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-05-03 00:13:45 -0700, Jeff Janes wrote:
  On Friday, May 2, 2014, Jeff Janes jeff.ja...@gmail.com wrote:
 
   I've been working with an app that uses a schema name whose spelling is
   hard to type, and the lack of tab completion for SET search_path TO
 was
   bugging me.  So see attached.
  
   I filter out the system schemata, but not public.

 That'd be nice.

  diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
  new file mode 100644
  index 6d26ffc..dec3d4a
  *** a/src/bin/psql/tab-complete.c
  --- b/src/bin/psql/tab-complete.c
  *** psql_completion(const char *text, int st
  *** 3230,3235 
  --- 3230,3242 
 
COMPLETE_WITH_LIST(my_list);
}
  + else if (pg_strcasecmp(prev2_wd, search_path) == 0)
  + {
  + COMPLETE_WITH_QUERY(Query_for_list_of_schemas
  +  AND
 nspname not like 'pg\\_%%' 
  +  AND
 nspname not like 'information_schema' 
  +  UNION
 SELECT 'DEFAULT' );
  + }

 Why should we exclude system schemata? That seems more likely to be
 confusing than helpful? I can see a point in excluding another backend's
 temp tables, but otherwise?


I've personally never had a need to set the search_path to a system schema,
and I guess I was implicitly modelling this on what is returned by \dn, not
by \dnS.   I wouldn't object much to including them; that would be better
than not having any completion.  I just don't see much point.

And now playing a bit with the system ones, I think it would be more
confusing to offer them.  pg_catalog and pg_temp_appropriate always get
searched, whether you put them in the search_path or not.

Cheers,

Jeff

[HACKERS] [PATCH] Fix use of free in walsender error handling after a sysid mismatch.

2014-05-05 Thread Andres Freund

Hi,

Walsender does a PQClear(con) but then accesses data acquired with
PQgetvalue(). That's clearly not ok.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 1b7df22ff6da40ad4fcfa4eeacc1822373baa4e6 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 5 May 2014 18:03:44 +0200
Subject: [PATCH] Fix use of free in walsender error handling after a sysid
 mismatch.

Found via valgrind.

The bug exists since the introduction of the walsender, so backpatch
to 9.0.
---
 src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 96f31c4..88d27c7 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -152,6 +152,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli)
 			 GetSystemIdentifier());
 	if (strcmp(primary_sysid, standby_sysid) != 0)
 	{
+		primary_sysid = pstrdup(primary_sysid);
 		PQclear(res);
 		ereport(ERROR,
 (errmsg(database system identifier differs between the primary and standby),
-- 
1.8.5.rc2.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
 Also, -1 for adding another log_line_prefix escape.  If you're routing
 multiple clusters logging to the same place (which is already a bit
 unlikely IMO), you can put distinguishing strings in log_line_prefix
 already.  And it's not like we've got an infinite supply of letters
 for those escapes.

 Using syslog and including the same config file from multiple clusters
 isn't that uncommon. But I can live without it.

So, if you are sharing a config file, how is it that you can set a
per-cluster cluster_name but not a per-cluster log_line_prefix?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regexp_replace( , , , NULL ) returns null?

2014-05-05 Thread Jim Nasby


On 5/2/14, 8:57 PM, Tom Lane wrote:

Jim Nasby jna...@enova.com writes:

ISTM it’d be a lot better if it treated NULL flags the same as ‘’...


In Oracle's universe that probably makes sense, but to me it's not
sensible.  Why should unknown flags produce a non-unknown result?


Only because they're more options than data.


I find it hard to envision many use-cases where you wouldn't actually
have the flags as a constant, anyway; they're too fundamental to the
behavior of the function.


Unless you're wrapping this function; handling the case of the flags being 
optional becomes easier then.

(FWIW, I'm creating a version that accepts an array of search/replace 
arguments.)
--
Jim Nasby, Lead Data Architect   (512) 569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/03/2014 01:07 AM, Andres Freund wrote:
 On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
 Just got a report of a replication issue with 9.2.8 from a community member:

 Here's the sequence:

 1) A -- B (sync rep)

 2) Shut down B

 3) Shut down A

 4) Start up B as a master

 5) Start up A as sync replica of B

 6) A successfully joins B as a sync replica, even though its transaction
 log is 1016 bytes *ahead* of B.

 7) Transactions written to B all hang

 8) Xlog on A is now corrupt, although the database itself is OK
 
 This is fundamentally borked practice.
 
 Now, the above sequence happened because of the user misunderstanding
 what sync rep really means.  However, A should not have been able to
 connect with B in replication mode, especially in sync rep mode; that
 should have failed.  Any thoughts on why it didn't?
 
 I'd guess that B, while starting up, has written further WAL records
 bringing it further ahead of A.

Apparently not; from what I've seen pg_stat_replication even *shows*
that the replica is ahead of the master.  Futher, Postgres should have
recognized that there was a timeline branch point before A's last
record, no?

I'm working on getting permission to access the DB files.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Andres Freund

On 2014-05-05 13:07:48 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
  Also, -1 for adding another log_line_prefix escape.  If you're routing
  multiple clusters logging to the same place (which is already a bit
  unlikely IMO), you can put distinguishing strings in log_line_prefix
  already.  And it's not like we've got an infinite supply of letters
  for those escapes.
 
  Using syslog and including the same config file from multiple clusters
  isn't that uncommon. But I can live without it.
 
 So, if you are sharing a config file, how is it that you can set a
 per-cluster cluster_name but not a per-cluster log_line_prefix?

Well, it's a included file. With log_line_prefix support just
cluster_name has to be set per cluster. Without you need string
interpolation in the config management to include cluster_name in
log_line_prefix.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

All,

I'm working with the cstore_fdw project, which has an interesting
property for an FDW: the FDW itself creates the files which make up the
database.   This raises a couple of questions:

1) Do we want to establish a standard directory for FDWs which create
files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
leave it up to each FDW to decide?

2) Do we want to support the TABLESPACE directive for FDWs?

While cstore is the first FDW to create its own files, it won't
necessarily be the last; I could imagine CSV_FDW doing so as well, or a
future SQLite_FDW which does the same.  So I think the above questions
are worth answering in general.  And we're planning to implement
automated file management for cstore_fdw fairly soon, so we want to make
it consistent with whatever we're doing in Postgres 9.5.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:25 AM, Andres Freund wrote:
 On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
 On 05/03/2014 01:07 AM, Andres Freund wrote:
 On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
 Just got a report of a replication issue with 9.2.8 from a community 
 member:

 Here's the sequence:

 1) A -- B (sync rep)

 2) Shut down B

 3) Shut down A

 4) Start up B as a master

 5) Start up A as sync replica of B

 6) A successfully joins B as a sync replica, even though its transaction
 log is 1016 bytes *ahead* of B.

 7) Transactions written to B all hang

 8) Xlog on A is now corrupt, although the database itself is OK

 This is fundamentally borked practice.

 Now, the above sequence happened because of the user misunderstanding
 what sync rep really means.  However, A should not have been able to
 connect with B in replication mode, especially in sync rep mode; that
 should have failed.  Any thoughts on why it didn't?

 I'd guess that B, while starting up, has written further WAL records
 bringing it further ahead of A.

 Apparently not; from what I've seen pg_stat_replication even *shows*
 that the replica is ahead of the master.  Futher, Postgres should have
 recognized that there was a timeline branch point before A's last
 record, no?
 
 There wasn't any timeline increase because - as far as I understand the
 above - there wasn't any promotion. The cluster was shut down and
 recovery.conf was created/removed respectively.

Ah, oops, left out a step.  B was promoted.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Docs for 9.4's worker_spi?

2014-05-05 Thread Bruce Momjian

Seems there is no documentation for the 9.4 worker_spi contrib module.  Is
this OK?  The comment at the top of the C file says:

 *  Sample background worker code that demonstrates various coding
 *  patterns: establishing a database connection; starting and committing
 *  transactions; using GUC variables, and heeding SIGHUP to reread
 *  the configuration file; reporting to pg_stat_activity; using the
 *  process latch to sleep and exit in case of postmaster death.


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Andres Freund

On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
 On 05/03/2014 01:07 AM, Andres Freund wrote:
  On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
  Just got a report of a replication issue with 9.2.8 from a community 
  member:
 
  Here's the sequence:
 
  1) A -- B (sync rep)
 
  2) Shut down B
 
  3) Shut down A
 
  4) Start up B as a master
 
  5) Start up A as sync replica of B
 
  6) A successfully joins B as a sync replica, even though its transaction
  log is 1016 bytes *ahead* of B.
 
  7) Transactions written to B all hang
 
  8) Xlog on A is now corrupt, although the database itself is OK
  
  This is fundamentally borked practice.
  
  Now, the above sequence happened because of the user misunderstanding
  what sync rep really means.  However, A should not have been able to
  connect with B in replication mode, especially in sync rep mode; that
  should have failed.  Any thoughts on why it didn't?
  
  I'd guess that B, while starting up, has written further WAL records
  bringing it further ahead of A.
 
 Apparently not; from what I've seen pg_stat_replication even *shows*
 that the replica is ahead of the master.  Futher, Postgres should have
 recognized that there was a timeline branch point before A's last
 record, no?

There wasn't any timeline increase because - as far as I understand the
above - there wasn't any promotion. The cluster was shut down and
recovery.conf was created/removed respectively.

To me this is a operator error. We could try to defend against it more
vigorously, but thats's hard to do without breaking actual usecases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Josh Berkus

On 05/04/2014 05:46 AM, Bruce Momjian wrote:
 I have completed the initial version of the 9.4 release notes.  You can
 view them here:
 
   http://www.postgresql.org/docs/devel/static/release-9-4.html
 
 I will be adding additional markup in the next few days.
 
 Feedback expected and welcomed.  I expect to be modifying this until we
 release 9.4 final.  I have marked items where I need help with question
 marks.
 

Major enhancement list:

Per discussion on the -advocacy list, the features we've picked out as
major enhancements for advocacy reasons this round are:

* Materialized Views (because with refresh concurrently, MatViews are
now useful, which they weren't, very, in 9.3)

* Logical Decoding/Changeset Extraction which we're calling Data Change
Streaming API in the advocacy stuff.  Not sure if you want to use that
name in the technical realease notes.

* Dynamic Background Workers (this one is a major feature, but won't be
front-listed in advocacy doc because it's hard to explain and nobody's
built anything cool with it yet.  But it should go under major
enhancements in the release notes)

* JSONB and related features (such as indexing)

* ALTER SYSTEM SET

Lemme know if you need description text for any of the above.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:40:29AM -0700, Josh Berkus wrote:
 On 05/04/2014 05:46 AM, Bruce Momjian wrote:
  I have completed the initial version of the 9.4 release notes.  You can
  view them here:
  
  http://www.postgresql.org/docs/devel/static/release-9-4.html
  
  I will be adding additional markup in the next few days.
  
  Feedback expected and welcomed.  I expect to be modifying this until we
  release 9.4 final.  I have marked items where I need help with question
  marks.
  
 
 Major enhancement list:
 
 Per discussion on the -advocacy list, the features we've picked out as
 major enhancements for advocacy reasons this round are:
 
 * Materialized Views (because with refresh concurrently, MatViews are
 now useful, which they weren't, very, in 9.3)
 
 * Logical Decoding/Changeset Extraction which we're calling Data Change
 Streaming API in the advocacy stuff.  Not sure if you want to use that
 name in the technical realease notes.
 
 * Dynamic Background Workers (this one is a major feature, but won't be
 front-listed in advocacy doc because it's hard to explain and nobody's
 built anything cool with it yet.  But it should go under major
 enhancements in the release notes)
 
 * JSONB and related features (such as indexing)
 
 * ALTER SYSTEM SET
 
 Lemme know if you need description text for any of the above.

OK, great!  Once I have the markup done, I will beef up the descriptions
if needed and copy the text up to the major items section so we have
that all ready for beta.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Fabrízio de Royes Mello

On Mon, May 5, 2014 at 2:26 PM, Josh Berkus j...@agliodbs.com wrote:

 All,

 I'm working with the cstore_fdw project, which has an interesting
 property for an FDW: the FDW itself creates the files which make up the
 database.   This raises a couple of questions:

 1) Do we want to establish a standard directory for FDWs which create
 files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
 leave it up to each FDW to decide?


-1. Each FDW must decide.


 2) Do we want to support the TABLESPACE directive for FDWs?

 While cstore is the first FDW to create its own files, it won't
 necessarily be the last; I could imagine CSV_FDW doing so as well, or a
 future SQLite_FDW which does the same.  So I think the above questions
 are worth answering in general.  And we're planning to implement
 automated file management for cstore_fdw fairly soon, so we want to make
 it consistent with whatever we're doing in Postgres 9.5.


-1. We cannot guarantee the consistency of files using an ALTER FOREIGN
TABLE ... SET TABLESPACE ... as a normal ALTER TABLE ... does.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello

[HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Robert Haas

Hi,

Today, I discovered that when building a btree index, the btree code
uses index_form_tuple() to create an index tuple from the heap tuple,
calls tuplesort_putindextuple() to copy that tuple into the sort's
memory context, and then frees the original one it built.  This seemed
inefficient, so I wrote a patch to eliminate the tuple copying.  It
works by adding a function tuplesort_putindextuplevalues(), which
builds the tuple in the sort's memory context and thus avoids the need
for a separate copy.  I'm not sure if that's the best approach, but
the optimization seems wortwhile.

I tested it by repeatedly executing REINDEX INDEX
pgbench_accounts_pkey on a PPC64 machine.  pgbench_accounts contains
10 million records.  With unpatched master as of
b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
amount of work involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 542ed43..0743474 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -171,28 +171,21 @@ btbuildCallback(Relation index,
 void *state)
 {
 	BTBuildState *buildstate = (BTBuildState *) state;
-	IndexTuple	itup;
-
-	/* form an index tuple and point it at the heap tuple */
-	itup = index_form_tuple(RelationGetDescr(index), values, isnull);
-	itup-t_tid = htup-t_self;
 
 	/*
 	 * insert the index tuple into the appropriate spool file for subsequent
 	 * processing
 	 */
 	if (tupleIsAlive || buildstate-spool2 == NULL)
-		_bt_spool(itup, buildstate-spool);
+		_bt_spool(buildstate-spool, htup-t_self, values, isnull);
 	else
 	{
 		/* dead tuples are put into spool2 */
 		buildstate-haveDead = true;
-		_bt_spool(itup, buildstate-spool2);
+		_bt_spool(buildstate-spool2, htup-t_self, values, isnull);
 	}
 
 	buildstate-indtuples += 1;
-
-	pfree(itup);
 }
 
 /*
diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c
index 9ddc275..8cd3a91 100644
--- a/src/backend/access/nbtree/nbtsort.c
+++ b/src/backend/access/nbtree/nbtsort.c
@@ -185,9 +185,10 @@ _bt_spooldestroy(BTSpool *btspool)
  * spool an index entry into the sort file.
  */
 void
-_bt_spool(IndexTuple itup, BTSpool *btspool)
+_bt_spool(BTSpool *btspool, ItemPointerData self, Datum *values, bool *isnull)
 {
-	tuplesort_putindextuple(btspool-sortstate, itup);
+	tuplesort_putindextuplevalues(btspool-sortstate, btspool-index,
+  self, values, isnull);
 }
 
 /*
diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c
index 8b520c1..3b0a06c 100644
--- a/src/backend/utils/sort/tuplesort.c
+++ b/src/backend/utils/sort/tuplesort.c
@@ -1150,6 +1150,29 @@ tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple)
 	 */
 	COPYTUP(state, stup, (void *) tuple);
 
+	MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * Collect one index tuple while collecting input data for sort, building
+ * it from caller-supplied values.
+ */
+void
+tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel,
+			  ItemPointerData self, Datum *values,
+  bool *isnull)
+{
+	MemoryContext oldcontext = MemoryContextSwitchTo(state-sortcontext);
+	SortTuple	stup;
+
+	stup.tuple = index_form_tuple(RelationGetDescr(rel), values, isnull);
+	((IndexTuple) stup.tuple)-t_tid = self;
+	USEMEM(state, GetMemoryChunkSpace(stup.tuple));
+	/* set up first-column key value */
+	stup.datum1 = index_getattr((IndexTuple) stup.tuple,
+1,
+RelationGetDescr(state-indexRel),
+stup.isnull1);
 	puttuple_common(state, stup);
 
 	MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h
index 1a8b16d..5ef1f9b 100644
--- a/src/include/access/nbtree.h
+++ b/src/include/access/nbtree.h
@@ -717,7 +717,8 @@ typedef struct BTSpool BTSpool; /* opaque type known only within nbtsort.c */
 extern BTSpool *_bt_spoolinit(Relation heap, Relation index,
 			  bool isunique, bool isdead);
 extern void _bt_spooldestroy(BTSpool *btspool);
-extern void _bt_spool(IndexTuple itup, BTSpool *btspool);
+extern void _bt_spool(BTSpool *btspool, ItemPointerData self,
+		  Datum *values, bool *isnull);
 extern void _bt_leafbuild(BTSpool *btspool, BTSpool *spool2);
 
 /*
diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h
index 05445f0..2b62a98 100644
--- a/src/include/utils/tuplesort.h
+++ b/src/include/utils/tuplesort.h
@@ -85,6 +85,9 @@ extern void tuplesort_puttupleslot(Tuplesortstate *state,
 	   TupleTableSlot *slot);
 extern void tuplesort_putheaptuple(Tuplesortstate *state, HeapTuple tup);
 extern void tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple);
+extern void tuplesort_putindextuplevalues(Tuplesortstate *state,
+

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 I'm working with the cstore_fdw project, which has an interesting
 property for an FDW: the FDW itself creates the files which make up the
 database.   This raises a couple of questions:

 1) Do we want to establish a standard directory for FDWs which create
 files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
 leave it up to each FDW to decide?

I think we ought to vigorously discourage FDWs from storing any files
inside $PGDATA.  This cannot lead to anything except grief.  Just for
starters, what will operations such as pg_basebackup do with them?

A larger and more philosophical point is that such a direction of
development could hardly be called a foreign data wrapper.  People
would expect Postgres to take full responsibility for such files,
including data integrity considerations such as fsync-at-checkpoints
and WAL support.  Even if we wanted the FDW abstractions to allow
for that, we're very far away from it.  And frankly I'd maintain
that FDW is the wrong abstraction.

 2) Do we want to support the TABLESPACE directive for FDWs?

A fortiori, no.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Andres Freund

On 2014-05-05 10:30:17 -0700, Josh Berkus wrote:
 On 05/05/2014 10:25 AM, Andres Freund wrote:
  On 2014-05-05 10:16:27 -0700, Josh Berkus wrote:
  On 05/03/2014 01:07 AM, Andres Freund wrote:
  On 2014-05-02 18:57:08 -0700, Josh Berkus wrote:
  Just got a report of a replication issue with 9.2.8 from a community 
  member:
 
  Here's the sequence:
 
  1) A -- B (sync rep)
 
  2) Shut down B
 
  3) Shut down A
 
  4) Start up B as a master
 
  5) Start up A as sync replica of B
 
  6) A successfully joins B as a sync replica, even though its transaction
  log is 1016 bytes *ahead* of B.
 
  7) Transactions written to B all hang
 
  8) Xlog on A is now corrupt, although the database itself is OK
 
  This is fundamentally borked practice.
 
  Now, the above sequence happened because of the user misunderstanding
  what sync rep really means.  However, A should not have been able to
  connect with B in replication mode, especially in sync rep mode; that
  should have failed.  Any thoughts on why it didn't?
 
  I'd guess that B, while starting up, has written further WAL records
  bringing it further ahead of A.
 
  Apparently not; from what I've seen pg_stat_replication even *shows*
  that the replica is ahead of the master.

That's the shutdown record from A that I've talked about.

  Futher, Postgres should have
  recognized that there was a timeline branch point before A's last
  record, no?
  
  There wasn't any timeline increase because - as far as I understand the
  above - there wasn't any promotion. The cluster was shut down and
  recovery.conf was created/removed respectively.
 
 Ah, oops, left out a step.  B was promoted.

Still a user error. You need to reclone.

Depending on how archiving and the target timeline was configured the
timeline increase won't be treated as an error...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote:
 How about:

This data type allows for faster access to values in the json document
 and faster and more useful indexing of json.

We should refer to the fact that jsonb is internally typed. This isn't
all that obvious now, but it is evident for example when you sort a
set of raw scalar numeric jsonb values, which has a sane ordering (the
implementation invokes numeric_cmp()). You also get an internal,
per-number-scalar display scale, just like the numeric type proper.
I'm not all that sure about how to go about succinctly expressing
this, but clearly it's important.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Andres Freund

Hi,

On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
 Today, I discovered that when building a btree index, the btree code
 uses index_form_tuple() to create an index tuple from the heap tuple,
 calls tuplesort_putindextuple() to copy that tuple into the sort's
 memory context, and then frees the original one it built.  This seemed
 inefficient, so I wrote a patch to eliminate the tuple copying.  It
 works by adding a function tuplesort_putindextuplevalues(), which
 builds the tuple in the sort's memory context and thus avoids the need
 for a separate copy.  I'm not sure if that's the best approach, but
 the optimization seems wortwhile.

Hm. It looks like we could quite easily just get rid of
tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I tested it by repeatedly executing REINDEX INDEX
 pgbench_accounts_pkey on a PPC64 machine.  pgbench_accounts contains
 10 million records.  With unpatched master as of
 b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
 amount of work involved.

Yes, that's certainly worthwile. Nice.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 While investigating an issue pointed out by valgrind around undefined
 bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
 bug in ReceiveSharedInvalidMessages(). It tries to be safe against
 recursion but it's not:
 When it recurses into ReceiveSharedInvalidMessages() from it's main loop
 from inside the inval callback while nextmsg = nummsgs it'll overwrite
 the 'messages' array with new contents. But at this point the old
 content of one entry in the messages array is still passed to
 the LocalExecuteInvalidationMessage() that caused the recursion.

Hm, yeah, so if the called inval function continues to use the message
contents after doing something that could result in a recursive call,
it might be looking at trashed data.

 It looks to me like this is broken since at least fad153ec. I think the
 fix is just to make the current 'SharedInvalidationMessage *msg' not be
 pointers but a local copiy of the to-be-processed entry.

Yeah, that should do it.  I think I'd been trying to avoid copying
messages more times than necessary, but evidently I optimized away
one copy step too many :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:53 AM, Tom Lane wrote:
 Josh Berkus j...@agliodbs.com writes:
 I'm working with the cstore_fdw project, which has an interesting
 property for an FDW: the FDW itself creates the files which make up the
 database.   This raises a couple of questions:
 
 1) Do we want to establish a standard directory for FDWs which create
 files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
 leave it up to each FDW to decide?
 
 I think we ought to vigorously discourage FDWs from storing any files
 inside $PGDATA.  This cannot lead to anything except grief.  Just for
 starters, what will operations such as pg_basebackup do with them?

That was one advantage to putting them in PGDATA; you get a copy of the
files with pg_basebackup.  Of course, they don't replicate after that,
but they potentially could, in the future, with Logical Streaming
Replication.

 A larger and more philosophical point is that such a direction of
 development could hardly be called a foreign data wrapper.  People
 would expect Postgres to take full responsibility for such files,
 including data integrity considerations such as fsync-at-checkpoints
 and WAL support.  Even if we wanted the FDW abstractions to allow
 for that, we're very far away from it.  And frankly I'd maintain
 that FDW is the wrong abstraction.

Certainly pluggable storage would be a better abstraction; but we don't
have that yet.  In the meantime, we have one FDW which creates files
*right now*, and we might have more in the future, so I'm trying to
establish some guidelines as to how such FDWs should behave.  Regardless
of whether or not you think FDWs should be managing files, it's better
for users if all FDWs which manage files manage them in the same way.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Magnus Hagander

On Mon, May 5, 2014 at 8:17 PM, Josh Berkus j...@agliodbs.com wrote:

 On 05/05/2014 10:53 AM, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
  I'm working with the cstore_fdw project, which has an interesting
  property for an FDW: the FDW itself creates the files which make up the
  database.   This raises a couple of questions:
 
  1) Do we want to establish a standard directory for FDWs which create
  files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
  leave it up to each FDW to decide?
 
  I think we ought to vigorously discourage FDWs from storing any files
  inside $PGDATA.  This cannot lead to anything except grief.  Just for
  starters, what will operations such as pg_basebackup do with them?

 That was one advantage to putting them in PGDATA; you get a copy of the
 files with pg_basebackup.  Of course, they don't replicate after that,
 but they potentially could, in the future, with Logical Streaming
 Replication.


Presumably they'd also be inconsistent? And as such not really useful
unless you actually shut the database down before you back it up (e.g.
don't use pg_basebackup)?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Andres Freund

On 2014-05-05 11:17:18 -0700, Josh Berkus wrote:
 On 05/05/2014 10:53 AM, Tom Lane wrote:
  Josh Berkus j...@agliodbs.com writes:
  I'm working with the cstore_fdw project, which has an interesting
  property for an FDW: the FDW itself creates the files which make up the
  database.   This raises a couple of questions:
  
  1) Do we want to establish a standard directory for FDWs which create
  files, such as $PGDATA/base/{database-oid}/fdw/ ?  Or would we want to
  leave it up to each FDW to decide?
  
  I think we ought to vigorously discourage FDWs from storing any files
  inside $PGDATA.  This cannot lead to anything except grief.  Just for
  starters, what will operations such as pg_basebackup do with them?
 
 That was one advantage to putting them in PGDATA; you get a copy of the
 files with pg_basebackup.

A corrupted copy. There's no WAL replay to correct skew due to write
activity while copying.

 Of course, they don't replicate after that,
 but they potentially could, in the future, with Logical Streaming
 Replication.

Nope. They're not in the WAL, so they won't be streamed out.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote:
 On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote:
  How about:
 
 This data type allows for faster access to values in the json document
  and faster and more useful indexing of json.
 
 We should refer to the fact that jsonb is internally typed. This isn't
 all that obvious now, but it is evident for example when you sort a
 set of raw scalar numeric jsonb values, which has a sane ordering (the
 implementation invokes numeric_cmp()). You also get an internal,
 per-number-scalar display scale, just like the numeric type proper.
 I'm not all that sure about how to go about succinctly expressing
 this, but clearly it's important.

How about:

JSONB values are also mapped to SQL scalar data types, rather
than being treated always as strings.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Josh Berkus

On 05/05/2014 11:31 AM, Bruce Momjian wrote:
   JSONB values are also mapped to SQL scalar data types, rather
 than being treated always as strings.

+ ... allowing for correct sorting of JSON according to internal datums.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] avoiding tuple copying in btree index builds

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
 Today, I discovered that when building a btree index, the btree code
 uses index_form_tuple() to create an index tuple from the heap tuple,
 calls tuplesort_putindextuple() to copy that tuple into the sort's
 memory context, and then frees the original one it built.  This seemed
 inefficient, so I wrote a patch to eliminate the tuple copying.  It
 works by adding a function tuplesort_putindextuplevalues(), which
 builds the tuple in the sort's memory context and thus avoids the need
 for a separate copy.  I'm not sure if that's the best approach, but
 the optimization seems wortwhile.

 Hm. It looks like we could quite easily just get rid of
 tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

I glanced at that, but it wasn't obvious to me how to convert the hash
usage.  If you have an idea, I'm all ears.

 I tested it by repeatedly executing REINDEX INDEX
 pgbench_accounts_pkey on a PPC64 machine.  pgbench_accounts contains
 10 million records.  With unpatched master as of
 b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s,
 6.177s, and 6.201s.  With the attached patch, I got times of 5.787s,
 5.972s, and 5.913s, a savings of almost 5%.  Not bad considering the
 amount of work involved.

 Yes, that's certainly worthwile. Nice.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep

2014-05-05 Thread Josh Berkus

On 05/05/2014 10:53 AM, Andres Freund wrote:
 Still a user error. You need to reclone.
 
 Depending on how archiving and the target timeline was configured the
 timeline increase won't be treated as an error...

Andres and I hashed this out on IRC.  The basic problem was that I was
relying on pg_stat_replication to point out when a successful
replication connection was established.  However, he pointed out cases
where pg_stat_replication will report sync or streaming even though
replication has failed due to differences in WAL position.  That appears
to be what happened here.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Tom Lane

Josh Berkus j...@agliodbs.com writes:
 On 05/05/2014 10:53 AM, Tom Lane wrote:
 A larger and more philosophical point is that such a direction of
 development could hardly be called a foreign data wrapper.  People
 would expect Postgres to take full responsibility for such files,
 including data integrity considerations such as fsync-at-checkpoints
 and WAL support.  Even if we wanted the FDW abstractions to allow
 for that, we're very far away from it.  And frankly I'd maintain
 that FDW is the wrong abstraction.

 Certainly pluggable storage would be a better abstraction; but we don't
 have that yet.  In the meantime, we have one FDW which creates files
 *right now*, and we might have more in the future, so I'm trying to
 establish some guidelines as to how such FDWs should behave.

The guideline is simple: don't do that.  We should absolutely not
encourage this until/unless we have infrastructure to support it.
Just because one FDW author thought this would be a cool thing to do
does not make it a cool thing to do, and definitely not a cool thing
to encourage others to emulate.

 Regardless
 of whether or not you think FDWs should be managing files, it's better
 for users if all FDWs which manage files manage them in the same way.

Sure.  They should all keep them outside $PGDATA, making it not-our-
problem.  When and if we're prepared to consider it our problem, we
will be sure to advise people.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 11:31 AM, Bruce Momjian br...@momjian.us wrote:
 JSONB values are also mapped to SQL scalar data types, rather
 than being treated always as strings.

Something like that. Perhaps you should just go with what the
documentation says: Primitive JSON types described by RFC 7159 are
effectively internally mapped onto native PostgreSQL types.

The fact that the default B-Tree operator class defines a type-wise
ordering is beside the point. That's just currently the most obvious
way in which this shadow typing is evident. At some point we're
going to have to figure out ways to manipulate jsonb using shadow type
specific operators or functions, so you can for example extract
actual numeric values easily. I don't want users to assume that those
don't exist right now due to a fundamental limitation of our
implementation.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Robert Haas

On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-05-04 13:44:17 +0200, Andres Freund wrote:
 postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC;
  key | off |size | allocated
 -+-+-+---
  Buffer Blocks   |   286242528 | 17179869184 | t
  Buffer Descriptors  |   152024800 |   134217728 | t
 ...
  OldSerXidControlData| 17584357344 |  16 | t
 (44 rows)

 Thinking about this, I think it was a mistake to not add a 'name' field
 to dynamic shared memory's dsm_control_item. Right now it's very hard to
 figure out which extension allocated a dsm segment. Imo we should change
 that before 9.4 is out. I am not suggesting to use it to identify
 segments, but just as an identifier, passed in into dsm_create().

 Imo there should be a corresponding pg_dynshmem_allocations to
 pg_shmem_allocations.

Well, right now a dsm_control_item is 8 bytes.  If we add a name field
of our usual 64 bytes, they'll each be 9 times bigger.  We're not
talking about a lot of bytes in absolute terms, but I guess I'm not in
favor of an 800% size increase without somewhat more justification
than you've provided here.  Who is using dynamic shared memory for
enough different things at this point to get confused?

I'm quite in favor of having something like this for the main shared
memory segment, but I think that's 9.5 material at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 Thinking about this, I think it was a mistake to not add a 'name' field
 to dynamic shared memory's dsm_control_item.

 Well, right now a dsm_control_item is 8 bytes.  If we add a name field
 of our usual 64 bytes, they'll each be 9 times bigger.

And the controlled shared segment is likely to be how big exactly?  It's
probably not even possible for it to be smaller than a page size, 4K or
so depending on the OS.  I agree with Andres that a name would be a good
idea; complaining about the space needed to hold it is penny-wise and
pound-foolish.

 I'm quite in favor of having something like this for the main shared
 memory segment, but I think that's 9.5 material at this point.

If you're prepared to break the current APIs later to add a name parameter
(which would have to be required, if it's to be useful at all), then sure,
put the question off till 9.5.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Andres Freund

On 2014-05-05 15:04:07 -0400, Robert Haas wrote:
 On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-05-04 13:44:17 +0200, Andres Freund wrote:
  postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC;
   key | off |size | 
  allocated
  -+-+-+---
   Buffer Blocks   |   286242528 | 17179869184 | t
   Buffer Descriptors  |   152024800 |   134217728 | t
  ...
   OldSerXidControlData| 17584357344 |  16 | t
  (44 rows)
 
  Thinking about this, I think it was a mistake to not add a 'name' field
  to dynamic shared memory's dsm_control_item. Right now it's very hard to
  figure out which extension allocated a dsm segment. Imo we should change
  that before 9.4 is out. I am not suggesting to use it to identify
  segments, but just as an identifier, passed in into dsm_create().
 
  Imo there should be a corresponding pg_dynshmem_allocations to
  pg_shmem_allocations.
 
 Well, right now a dsm_control_item is 8 bytes.  If we add a name field
 of our usual 64 bytes, they'll each be 9 times bigger.  We're not
 talking about a lot of bytes in absolute terms, but I guess I'm not in
 favor of an 800% size increase without somewhat more justification
 than you've provided here.  Who is using dynamic shared memory for
 enough different things at this point to get confused?

The kernel side overhead of creating a shared memory segment are so much
higher that this really isn't a meaningful saving. Also, are you really
considering a couple hundred bytes to be a problem?
I think it's quite a sensible thing for an administrator to ask where
all the memory has gone. The more users for dsm there the more important
that'll get. Right now pretty much the only thing a admin could do is to
poke around in /proc to see which backend has mapped the segment and try
to figure out via the logs what caused it to do so. Not nice.

 I'm quite in favor of having something like this for the main shared
 memory segment, but I think that's 9.5 material at this point.

Clearly. For one the version I posted here missed allocations which
aren't done via ShmemInitStruct (lwlock main array and hash table
allocations primarily). For another it's too late ;)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Andres Freund

On 2014-05-05 15:09:02 -0400, Tom Lane wrote:
  I'm quite in favor of having something like this for the main shared
  memory segment, but I think that's 9.5 material at this point.
 
 If you're prepared to break the current APIs later to add a name parameter
 (which would have to be required, if it's to be useful at all), then sure,
 put the question off till 9.5.

I understood Robert to mean that it's too late for my proposed view for
9.4 - and I agree - but I wholeheartedly agree with you that we should
add a name parameter to the dsm API *now*. We can just Assert() that it's
nonzero if we don't think it's useful for now.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Andres Freund

On 2014-05-05 14:15:58 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  While investigating an issue pointed out by valgrind around undefined
  bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a
  bug in ReceiveSharedInvalidMessages(). It tries to be safe against
  recursion but it's not:
  When it recurses into ReceiveSharedInvalidMessages() from it's main loop
  from inside the inval callback while nextmsg = nummsgs it'll overwrite
  the 'messages' array with new contents. But at this point the old
  content of one entry in the messages array is still passed to
  the LocalExecuteInvalidationMessage() that caused the recursion.
 
 Hm, yeah, so if the called inval function continues to use the message
 contents after doing something that could result in a recursive call,
 it might be looking at trashed data.

FWIW, with a bit of changes around it to make it more visible
(malloc/freeing the array) this sometimes triggers during make check. So
it's quite plausible that this is the caused the relfilenodemap
regression error we were a bit confused about.

I started to look at this code because valgrind was bleating at me
inside inval.c and for the heck of I couldn't understand wh. Since then
this lead me into quite a wild goose chase. Leading me to discover a
couple of things I am not perfectly happy about:

a) SICleanupQueue() sometimes releases and reacquires the write lock
   held on the outside. That's pretty damn fragile, not to mention
   ugly. Even slight reformulations of the code in SIInsertDataEntries()
   can break this... Can we please extend the comment in the latter that
   to mention the lock dropping explicitly?
b) we right/left shift -1 in a signed int by 16 in
   CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
   implementation defined behaviour.
c) The ProcessMessageList() calls access msg-rc.id to test for the type
   of the existing message. That's not nice. The compiler is entirely
   free to e.g. copy the entire struct to local memory causing
   unitialized memory to be accessed. Entirely cosmetic, but ...

d)
The valgrind splats I was very occasionally getting were:
==21013== Conditional jump or move depends on uninitialised value(s)
==21013==at 0x8DD755: hash_search_with_hash_value (dynahash.c:885)
==21013==by 0x8DD60F: hash_search (dynahash.c:811)
==21013==by 0x799A58: smgrclosenode (smgr.c:357)
==21013==by 0x8B6ABF: LocalExecuteInvalidationMessage (inval.c:566)
==21013==by 0x77BBCF: ReceiveSharedInvalidMessages (sinval.c:127)
==21013==by 0x8B6C3F: AcceptInvalidationMessages (inval.c:640)
==21013==by 0x77FDCC: LockRelationOid (lmgr.c:107)
==21013==by 0x4A6F33: relation_open (heapam.c:1029)
==21013==by 0x4A71EB: heap_open (heapam.c:1195)
==21013==by 0x8B4228: SearchCatCache (catcache.c:1223)
==21013==by 0x8C61B1: SearchSysCache (syscache.c:909)
==21013==by 0x8C62CD: GetSysCacheOid (syscache.c:987)
==21013==  Uninitialised value was created by a stack allocation
==21013==at 0x8B64C3: AddCatcacheInvalidationMessage (inval.c:328)

After far, far too much confused head scratching, code reading, random
elog()s et al I found out that this is just because of a deficiency in
valgrind's undefinedness tracking. The problem is that it doesn't really
understand shared memory sufficiently. When a backend stored a message
in the SI ringbuffer it sets a bitmask about initialized memory for a
specific position in the ringubffer. If the *same* backend reads from
the same position in the ringbuffer (4096 messages later) into which
another backend has stored a *different* type of message the bitmap of
initialized bytes will possibly be wrong if the padding is distributed
differently.

Unfortunately this cannot precisely be caught by valgrind's
suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
0) in AddCatcacheInvalidationMessage() et al. to suppress these
warnings. Imo we can just add them unconditionally, but if somebody else
prefers we can add #ifdef USE_VALGRIND around them.

I can do a), c), d), if others agree but I am not sure about b)?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 a) SICleanupQueue() sometimes releases and reacquires the write lock
held on the outside. That's pretty damn fragile, not to mention
ugly. Even slight reformulations of the code in SIInsertDataEntries()
can break this... Can we please extend the comment in the latter that
to mention the lock dropping explicitly?

Want to write a better comment?

 b) we right/left shift -1 in a signed int by 16 in
CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
implementation defined behaviour.

Looks all right to me.  Yeah, the right shift might have undefined
high-order bits, but we don't care because we're storing the result
into an int16.

 c) The ProcessMessageList() calls access msg-rc.id to test for the type
of the existing message. That's not nice.

Huh?

 After far, far too much confused head scratching, code reading, random
 elog()s et al I found out that this is just because of a deficiency in
 valgrind's undefinedness tracking. [...]
 Unfortunately this cannot precisely be caught by valgrind's
 suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
 0) in AddCatcacheInvalidationMessage() et al. to suppress these
 warnings. Imo we can just add them unconditionally, but if somebody else
 prefers we can add #ifdef USE_VALGRIND around them.

I'd be okay with USE_VALGRIND.  I'm not particularly hot on adding a
memset for everybody just to make valgrind less confused.  Especially
since that's really going to hide any problems, not fix them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?

2014-05-05 Thread Andres Freund

Hi,

We really should fix this one of these days.

On 2014-03-26 18:45:54 -0700, Peter Geoghegan wrote:
 Attached patch silences the Invalid read of size n complaints of
 Valgrind. I agree with your general thoughts around backpatching. Note
 that the patch addresses a distinct complaint from Kevin's, as
 Valgrind doesn't take issue with the invalid reads past the end of
 spgxlogPickSplit variables on the stack.

I don't think that's entirely sufficient. The local spgxlogPickSplit
xlrec;/spgxlogMoveLeafs xlrec; variables are also inserted while
MAXLIGNing their size. That's slightly harder to fix :(. I don't have a
better idea than also allocating them dynamically :(

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default

2014-05-05 Thread Jim Nasby


Prior to default parameters on functions, GRANT and COMMENT accepted full 
parameter syntax. IE:

GRANT EXECUTE ON test(t text) TO public

as opposed to regprocedure, which only accepts the data types ( test(text), not 
test(t text) ).

They do not accept DEFAULT though:

GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
ERROR:  syntax error at or near DEFAULT
LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;

Presumably this is just an oversight? Related to that, is it intentional that 
the regprocedure cast disallows *any* decorators to the function, other than 
type? If regprocedure at least accepted the full function parameter definition 
you could use it to get a definitive reference to a function.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 1:13 PM, Andrew Dunstan and...@dunslane.net wrote:
 The fact that jsonb maps scalar values to internal postgres types is an
 implementation artefact.

I wouldn't go that far, but I take your point. The fact that the
primitive/scalar JSON types are in a very real sense strongly typed is
a very important detail, though. The fact that what became jsonb is
like a nested, typed hstore has always been prominently emphasized,
for good reasons.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Andrew Dunstan



On 05/05/2014 02:33 PM, Josh Berkus wrote:

On 05/05/2014 11:31 AM, Bruce Momjian wrote:

JSONB values are also mapped to SQL scalar data types, rather
than being treated always as strings.

+ ... allowing for correct sorting of JSON according to internal datums.




The problem is that at least in one sense that's not what we're doing. 
The canonical ordering of object keys is not at all standard lexical 
ordering.


I really don't think this is Release Notes material. The fact that jsonb 
maps scalar values to internal postgres types is an implementation artefact.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default

2014-05-05 Thread Alvaro Herrera

Jim Nasby wrote:
 Prior to default parameters on functions, GRANT and COMMENT accepted full 
 parameter syntax. IE:
 
 GRANT EXECUTE ON test(t text) TO public
 
 as opposed to regprocedure, which only accepts the data types ( test(text), 
 not test(t text) ).
 
 They do not accept DEFAULT though:
 
 GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
 ERROR:  syntax error at or near DEFAULT
 LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public;
 
 Presumably this is just an oversight?

I have to say that accepting DEFAULT there seems pretty odd to me.  What
if you specify the wrong default?  Do you get a no such function
error?  That would be pretty unhelpful.  But then accepting it ignoring
the fact that the default is wrong would be rather strange.

 Related to that, is it intentional that the regprocedure cast
 disallows *any* decorators to the function, other than type? If
 regprocedure at least accepted the full function parameter definition
 you could use it to get a definitive reference to a function.

Does pg_identify_object() give you what you want?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Andres Freund

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  a) SICleanupQueue() sometimes releases and reacquires the write lock
 held on the outside. That's pretty damn fragile, not to mention
 ugly. Even slight reformulations of the code in SIInsertDataEntries()
 can break this... Can we please extend the comment in the latter that
 to mention the lock dropping explicitly?
 
 Want to write a better comment?

Will do.

  b) we right/left shift -1 in a signed int by 16 in
 CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's
 implementation defined behaviour.
 
 Looks all right to me.  Yeah, the right shift might have undefined
 high-order bits, but we don't care because we're storing the result
 into an int16.

Doesn't at the very least
rnode.backend = (msg-sm.backend_hi  16) | (int) 
msg-sm.backend_lo;
have to be
rnode.backend = ((int) msg-sm.backend_hi  16) | (int) 
msg-sm.backend_lo;

  c) The ProcessMessageList() calls access msg-rc.id to test for the type
 of the existing message. That's not nice.
 
 Huh?

The checks should access msg-id, not msg-rc.id...

  After far, far too much confused head scratching, code reading, random
  elog()s et al I found out that this is just because of a deficiency in
  valgrind's undefinedness tracking. [...]
  Unfortunately this cannot precisely be caught by valgrind's
  suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
  0) in AddCatcacheInvalidationMessage() et al. to suppress these
  warnings. Imo we can just add them unconditionally, but if somebody else
  prefers we can add #ifdef USE_VALGRIND around them.
 
 I'd be okay with USE_VALGRIND.  I'm not particularly hot on adding a
 memset for everybody just to make valgrind less confused.  Especially
 since that's really going to hide any problems, not fix them.

I can't see it having any measureable overhead. Even old compilers will
optimize the memset() away and just initialize the padding... But
anyway, USE_VALGRIND is fine with me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:40:29AM -0700, Josh Berkus wrote:
 Major enhancement list:
 
 Per discussion on the -advocacy list, the features we've picked out as
 major enhancements for advocacy reasons this round are:
 
 * Materialized Views (because with refresh concurrently, MatViews are
 now useful, which they weren't, very, in 9.3)
 
 * Logical Decoding/Changeset Extraction which we're calling Data Change
 Streaming API in the advocacy stuff.  Not sure if you want to use that
 name in the technical realease notes.
 
 * Dynamic Background Workers (this one is a major feature, but won't be
 front-listed in advocacy doc because it's hard to explain and nobody's
 built anything cool with it yet.  But it should go under major
 enhancements in the release notes)
 
 * JSONB and related features (such as indexing)
 
 * ALTER SYSTEM SET
 
 Lemme know if you need description text for any of the above.

OK, I added doc links and this list of major features.  You can see the
results here:

http://momjian.us/pgsql_docs/release-9-4.html

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TABLESPACE and directory for Foreign tables?

2014-05-05 Thread Josh Berkus

On 05/05/2014 11:53 AM, Tom Lane wrote:
 Sure.  They should all keep them outside $PGDATA, making it not-our-
 problem.  When and if we're prepared to consider it our problem, we
 will be sure to advise people.

OK.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Robert Haas

On Mon, May 5, 2014 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote:
 Thinking about this, I think it was a mistake to not add a 'name' field
 to dynamic shared memory's dsm_control_item.

 Well, right now a dsm_control_item is 8 bytes.  If we add a name field
 of our usual 64 bytes, they'll each be 9 times bigger.

 And the controlled shared segment is likely to be how big exactly?  It's
 probably not even possible for it to be smaller than a page size, 4K or
 so depending on the OS.  I agree with Andres that a name would be a good
 idea; complaining about the space needed to hold it is penny-wise and
 pound-foolish.

The control segment is sized to support a number of dynamic shared
memory segments not exceeding 64 + 2 *MaxBackends.  With default
settings, that currently works out to 288 segments, or 2306 bytes.
So, adding a 64-byte name to each of those structures would increase
the size from 2k to about 20k.

So, sure, that's not a lot of memory.  But I'm still not convinced
that's it's very useful.  What I think is going to happen is that (1)
most people won't be used dynamic shared memory at all, so they won't
have any use for this; (2) those people who do run an extension that
uses dynamic shared memory will most likely only be running one such
extension, so they won't need a name to know what the segments are
being used for; and (3) if and when we eventually get parallel query,
dynamic shared memory segments will be widely used, but a bunch of
segments that are all named parallel_query or parallel_query.$PID
isn't going to be too informative.

Now, all that having been said, I recognize that human-readable names
are a generally useful thing, so I'm not going to hold my breath until
I turn blue if other people really want this, and it may turn out to
be useful someday.  But if anyone is curious whether I'm *confident*
that it will be useful someday: at this point, no.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Sun, May 4, 2014 at 6:49 AM, Andres Freund and...@2ndquadrant.com wrote:
 +  listitem
 +   para
 +Have pg_stat_statements use a flat file for query text storage, 
 allowing higher limits (Peter Geoghegan)
 +   /para
 +
 +   para
 +Also add the ability to retrieve all pg_stat_statements information 
 except the query text.  This allows programs to reuse the query
 +text already retrieved by referencing queryid.
 +   /para
 +  /listitem

 This isn't an optional thing, is it?

This is intended to be used by time-series monitoring tools that
aggregate and graph pg_stat_statements data temporally. They usually
won't need query texts, and so can only retrieve them lazily. The
pg_stat_statements view presents exactly the same interface for ad-hoc
querying, though.

The point of the first item is that there is no longer *any*
limitation on the size of stored query texts. They are no longer
truncated to track_activity_query_size bytes. The shared memory
overhead is also decreased substantially, allowing us to increase the
default pg_stat_statements.max setting from 1,000 to 5,000, while
still reducing the overall shared memory overhead (assuming a default
track_activity_query_size). I think that the removal of the
limitation, and the substantial lowering of the per-entry footprint
should both be explicitly noted.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 9.4 wrap around issues

2014-05-05 Thread Jeff Janes

While stress testing the crash-recovery system, I keep running into
wraparound shutdowns that I think should not be occurring.  I go out of my
way give autovac a chance to complete every now and then, more often than
it should need to in order to keep up with the xid usage.

I think the problem traces down to the fact that updating
pg_database.datfrozenxid is WAL logged, but
updating ShmemVariableCache-oldestXid is not.

So a crash at just the right time means that the databases no longer think
they need to be vacuumed for wrap around, but the system as a whole thinks
that they do.

Should crash recovery end with the system reading the recovered pg_database
and recomputing ShmemVariableCache-oldestXid ?

I don't know that any non-pathological case could trigger this in
production.  But 2^32 is not getting any larger, while maximum common
database throughput and size is.

I bisect this down to f9b50b7c18c8ce7de1fee59409fe2 Fix removal of files
in pgstats directories.  I think that before that commit, the leftover
stats file was somehow tricking the system into vacuuming the databases
more aggressively following a crash.

Cheers,

Jeff

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-05 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
 Looks all right to me.  Yeah, the right shift might have undefined
 high-order bits, but we don't care because we're storing the result
 into an int16.

 Doesn't at the very least
   rnode.backend = (msg-sm.backend_hi  16) | (int) 
 msg-sm.backend_lo;
 have to be
   rnode.backend = ((int) msg-sm.backend_hi  16) | (int) 
 msg-sm.backend_lo;

A promotion to int (or wider) is implicit in any arithmetic operation,
last I checked the C standard.  The (int) on the other side isn't
necessary either.

We might actually be better off casting both sides to unsigned int,
just to enforce that the left shifting is done with unsigned semantics.

 c) The ProcessMessageList() calls access msg-rc.id to test for the type
 of the existing message. That's not nice.

 Huh?

 The checks should access msg-id, not msg-rc.id...

Meh.  If they're not the same thing, we've got big trouble anyway.
But if you want to change it as a cosmetic thing, no objection.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-05 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, May 5, 2014 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 And the controlled shared segment is likely to be how big exactly?  It's
 probably not even possible for it to be smaller than a page size, 4K or
 so depending on the OS.  I agree with Andres that a name would be a good
 idea; complaining about the space needed to hold it is penny-wise and
 pound-foolish.
 ...
 Now, all that having been said, I recognize that human-readable names
 are a generally useful thing, so I'm not going to hold my breath until
 I turn blue if other people really want this, and it may turn out to
 be useful someday.  But if anyone is curious whether I'm *confident*
 that it will be useful someday: at this point, no.

I'm not confident that it'll be useful either.  But I am confident that
if we don't put it in now, and decide we want it later, there will be
complaints when we change the API.  Better to have an ignored parameter
than no parameter.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cluster name in ps output

2014-05-05 Thread Thomas Munro

On 5 May 2014 17:22, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-05-05 13:07:48 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   On 2014-05-05 10:07:46 -0400, Tom Lane wrote:
   Also, -1 for adding another log_line_prefix escape.  If you're routing
   multiple clusters logging to the same place (which is already a bit
   unlikely IMO), you can put distinguishing strings in log_line_prefix
   already.  And it's not like we've got an infinite supply of letters
   for those escapes.
 
   Using syslog and including the same config file from multiple clusters
   isn't that uncommon. But I can live without it.
 
  So, if you are sharing a config file, how is it that you can set a
  per-cluster cluster_name but not a per-cluster log_line_prefix?

 Well, it's a included file. With log_line_prefix support just
 cluster_name has to be set per cluster. Without you need string
 interpolation in the config management to include cluster_name in
 log_line_prefix.


Attached as cluster-name-in-ps-v3-a.patch is a new version with
the following changes:

* the brackets removed, as suggested by Tom Lane

* static initialization of cluster_name to avoid any possibility
  of an uninitialized/null pointer being used before GUC
  initialization, as suggested by Andres Freund

* cluster_name moved to config group CONN_AUTH_SETTINGS, on the
  basis that it's similar to bonjour_name, but it isn't
  really... open to suggestions for a better config_group!

* a small amount of documentation in config.sgml (with
  cluster_name under Connection Settings, which probably
  isn't right either)

* sample conf file updated to show cluster_name and %C in
  log_line_prefix

A shorter version without the log_line_prefix support that Tom didn't
like is attached as cluster-name-in-ps-v3-b.patch.  I will try to add these
to the open commitfest, and see if there is something I can usefully
review in return.

I verified that SHOW cluster_name works as expected and you can't
change it with SET.

Thanks,

Thomas Munro
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 4a33f87..0fd414e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -681,6 +681,24 @@ include 'filename'
   /listitem
  /varlistentry
 
+ varlistentry id=guc-cluster-name xreflabel=cluster_name
+  termvarnamecluster_name/varname (typestring/type)/term
+  indexterm
+   primaryvarnamecluster_name/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+Sets the cluster name that appears in the process title for all
+processes in this cluster.  No name is shown if this parameter is set
+to the empty string literal''/ (which is the default).  The
+process title is typically viewed by the commandps/ command, or in
+Windows by using the applicationProcess Explorer/.  The cluster
+name can also be included in the varnamelog_line_prefix/varname.
+This parameter can only be set at server start.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-tcp-keepalives-idle xreflabel=tcp_keepalives_idle
   termvarnametcp_keepalives_idle/varname (typeinteger/type)/term
   indexterm
@@ -4212,6 +4230,11 @@ local0.*/var/log/postgresql
 /thead
tbody
 row
+ entryliteral%C/literal/entry
+ entryCluster name/entry
+ entryno/entry
+/row
+row
  entryliteral%a/literal/entry
  entryApplication name/entry
  entryyes/entry
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 977fc66..0adfee7 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata)
 else
 	appendStringInfo(buf, %lx.%x, (long) (MyStartTime), MyProcPid);
 break;
+			case 'C':
+if (padding != 0)
+	appendStringInfo(buf, %*s, padding, cluster_name);
+else
+	appendStringInfoString(buf, cluster_name);
+break;
 			case 'p':
 if (padding != 0)
 	appendStringInfo(buf, %*d, padding, MyProcPid);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 15020c4..71897b8 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -449,6 +449,7 @@ int			temp_file_limit = -1;
 
 int			num_temp_buffers = 1024;
 
+const char *cluster_name = ;
 char	   *data_directory;
 char	   *ConfigFileName;
 char	   *HbaFileName;
@@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] =
 	},
 
 	{
+		{cluster_name, PGC_POSTMASTER, CONN_AUTH_SETTINGS,
+			gettext_noop(Sets the name of the cluster that appears in 'ps' output.),
+			NULL,
+			GUC_IS_NAME
+		},
+		cluster_name,
+		,
+		NULL, NULL, NULL
+	},
+
+	{
 		{data_directory, PGC_POSTMASTER, FILE_LOCATIONS,
 			gettext_noop(Sets the server's data directory.),

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 02:23:13PM -0700, Peter Geoghegan wrote:
 On Sun, May 4, 2014 at 6:49 AM, Andres Freund and...@2ndquadrant.com wrote:
  +  listitem
  +   para
  +Have pg_stat_statements use a flat file for query text storage, 
  allowing higher limits (Peter Geoghegan)
  +   /para
  +
  +   para
  +Also add the ability to retrieve all pg_stat_statements 
  information except the query text.  This allows programs to reuse the query
  +text already retrieved by referencing queryid.
  +   /para
  +  /listitem
 
  This isn't an optional thing, is it?
 
 This is intended to be used by time-series monitoring tools that
 aggregate and graph pg_stat_statements data temporally. They usually
 won't need query texts, and so can only retrieve them lazily. The
 pg_stat_statements view presents exactly the same interface for ad-hoc
 querying, though.
 
 The point of the first item is that there is no longer *any*
 limitation on the size of stored query texts. They are no longer
 truncated to track_activity_query_size bytes. The shared memory
 overhead is also decreased substantially, allowing us to increase the
 default pg_stat_statements.max setting from 1,000 to 5,000, while
 still reducing the overall shared memory overhead (assuming a default
 track_activity_query_size). I think that the removal of the
 limitation, and the substantial lowering of the per-entry footprint
 should both be explicitly noted.

We rarely get into specific numers like this.  It says higher limit(s) and
hopefully that is enough.  If you want to create a documentation 'id' I
can like to that for the higher limits text.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Bruce Momjian

On Mon, May  5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote:
 On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote:
  How about:
 
 This data type allows for faster access to values in the json document
  and faster and more useful indexing of json.
 
 We should refer to the fact that jsonb is internally typed. This isn't
 all that obvious now, but it is evident for example when you sort a
 set of raw scalar numeric jsonb values, which has a sane ordering (the
 implementation invokes numeric_cmp()). You also get an internal,
 per-number-scalar display scale, just like the numeric type proper.
 I'm not all that sure about how to go about succinctly expressing
 this, but clearly it's important.

Current text is:

Add structured (non-text) data type (JSONB) for storing JSON data (Oleg
Bartunov, Teodor Sigaev, Alexander Korotkov, Peter Geoghegan, and Andrew
Dunstan)

This allows for faster access to values in the JSON document and faster
and more useful indexing of JSON.  JSONB values are also typed as
appropriate scalar SQL types.

Is that OK?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 4:14 PM, Bruce Momjian br...@momjian.us wrote:
 We rarely get into specific numers like this.  It says higher limit(s) and
 hopefully that is enough.  If you want to create a documentation 'id' I
 can like to that for the higher limits text.

You don't have to mention a number. Just the fact that there is now
*no* limit on stored query text length, which is the point. It's also
nice that the per-entry shared memory overhead is much lower, which as
I said I think warrants a mention in passing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-05 Thread Peter Geoghegan

On Mon, May 5, 2014 at 4:16 PM, Bruce Momjian br...@momjian.us wrote:
 This allows for faster access to values in the JSON document and 
 faster
 and more useful indexing of JSON.  JSONB values are also typed as
 appropriate scalar SQL types.

 Is that OK?

That seems reasonable.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 114 matches

Mail list logo