Re: [HACKERS] 9.4 release notes
On 05/04/2014 03:46 PM, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. Thanks! You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Two rather large changes to the B-tree algorithms are missing: * Fix race condition in B-tree page deletion (commit efada2b8) * Make the handling of interrupted B-tree page splits more robust (commit 40dae7ec) These changes are not visible to users, but they are good toknow for beta testers. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 05/04/2014 03:46 PM, Bruce Momjian wrote: Auto-resize the catalog cache (Heikki Linnakangas) This reduces memory consumption for backends accessing only a few tables, and improves performance for backend accessing many tables. Move this to General performance section. Improve spinlock speed on x86_64 CPUs (test on i386?) (Heikki Linnakangas) I believe this refers to commit b03d196b. The test on i386 note can just be removed. Someone ought to test it on 32-bit i386 to see if the same change would be beneficial there, but that's a TODO item more than a release notes item. I doubt anyone's interested to spend time performance testing spinlocks on 32-bit i386, though, so I think we're going to just retain the current situation for the next decade. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Per table autovacuum vacuum cost limit behaviour strange
On 05/05/14 15:22, Amit Kapila wrote: Here what I could understand is that sum of cost_limit for all autovacuum workers should never exceed the value of autovacuum_vacuum_cost_limit which seems to be always the case in current code but same is not true for proposed patch. Right, but have a look at the 1st message in this thread - the current behavior (and to a large extent the above condition) means that setting cost limits per table does not work in any remotely intuitive way. ITSM that the whole purpose of a per table setting in this context is to override the behavior of auto vacuum throttling - and currently this does not happen unless you get real brutal (i.e setting the cost delay to zero in addition to setting cost limit...making the whole cost limit a bit pointless). regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
On 05/04/2014 11:13 PM, Magnus Hagander wrote: On Sun, May 4, 2014 at 9:35 PM, Josh Berkus j...@agliodbs.com wrote: Architecture wise there's no coverage for: * some ARM architecture varians I could run a buildfarm animal on a Raspberry Pi if the Postgres community will replace my flash cards as they burn out. Heikki already does that - it's chipmunk. Michael Paquier's hamster is also a Raspberry Pi. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sending out a request for more buildfarm animals?
On Mon, May 5, 2014 at 3:40 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 05/04/2014 11:13 PM, Magnus Hagander wrote: On Sun, May 4, 2014 at 9:35 PM, Josh Berkus j...@agliodbs.com wrote: Architecture wise there's no coverage for: * some ARM architecture varians I could run a buildfarm animal on a Raspberry Pi if the Postgres community will replace my flash cards as they burn out. Heikki already does that - it's chipmunk. Michael Paquier's hamster is also a Raspberry Pi. Yep, using Archlinux for the PI, Heikki's stuff plays with Raspbian. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] using array of char pointers gives wrong results
Let me bring the bug fix again to the surface. Is anybody looking at this fix? On Tue, Apr 29, 2014 at 2:25 PM, Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote: Hi, When array of char * is used as target for the FETCH statement returning more than one row, it tries to store all the result in the first element. PFA test_char_select.pgc, which fetches first 3 relnames from pg_class ordered by relname. The program prints following result steps to compile and build the program ecpg -c -Iecpg_include_dir test_char_select.pgc cc -Ipg installation include dir -g -c -o test_char_select.o test_char_select.c cc -g test_char_select.o -Lpg installation lib dir -lecpg -lpq -lpgtypes -o test_char_select output ./test_char_select relname=___pg_foreign_table_columns relname= relname= The first three relnames should have been postgres=# select relname from pg_class order by relname limit 3; relname --- _pg_foreign_data_wrappers _pg_foreign_servers _pg_foreign_table_columns It's obvious that the first element of the array is being overwritten with an offset of 1. This happens because, the array of char pointer is dumped as /* Fetch multiple columns into one structure. */ { ECPGdo(__LINE__, 0, 1, NULL, 0, ECPGst_normal, fetch 3 from cur1, ECPGt_EOIT, ECPGt_char,(strings),(long)0,(long)3,*(1)*sizeof(char)*, ECPGt_NO_INDICATOR, NULL , 0L, 0L, 0L, ECPGt_EORT); Since the offset is 1, the next result overwrites the previous result except for the first byte. PFA patch ecpg_char_ptr_arr.patch to fix this issue. It has changes as follows 1. Dump array of char pointer with right offset i.e. sizeof(char *) 2. While reading array of char pointer in ecpg_do_prologue(), use the address instead of the value at that address 3. The pointer arithmetic should treat such variable as char **, instead of char * ECPG regression tests do not show any failures with this patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Sending out a request for more buildfarm animals?
Hi, On 2014-05-04 12:35:44 -0700, Josh Berkus wrote: There's pretty little coverage of non mainstream platforms/compilers in the buildfarm atm. Maybe we should send an email on -announce asking for new ones? There's no coverage for OS-wise; * AIX (at all) * HP-UX (for master at least) (* Tru64) (* UnixWare) Do we want a SmartOS (opensolaris/Joyent) animal? Do we already have one? I don't think so. We only have solaris 10 afaics. * mips * s390/x * sparc 32bit Do we really care about sparc 32bit at this point? You're talking a 10-year-old machine, there. I personally don't really, but the last time it came up significant parts of community opinionated the other way. And I'd rather have it tested and actually supported than supposedly supported. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cluster name in ps output
Hi When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. Obviously it's easy enough to find out (for example by looking at a tree view in htop/ps that shows the -D parameter of the parent postmaster), but I thought it could be useful to be able to give a name to the cluster and include it in the ps output. Here is a strawman patch that does that. If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle I would be grateful for any feedback. Thanks, Thomas diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 15020c4..7f7fd52 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -449,6 +449,7 @@ int temp_file_limit = -1; int num_temp_buffers = 1024; +char *cluster_name; char *data_directory; char *ConfigFileName; char *HbaFileName; @@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] = }, { + {cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS, + gettext_noop(Sets the name of the cluster that appears in 'ps' output.), + NULL, + GUC_IS_NAME + }, + cluster_name, + , + NULL, NULL, NULL + }, + + { {data_directory, PGC_POSTMASTER, FILE_LOCATIONS, gettext_noop(Sets the server's data directory.), NULL, diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 6294ca3..ead7ea4 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -29,6 +29,7 @@ #include libpq/libpq.h #include miscadmin.h #include utils/ps_status.h +#include utils/guc.h extern char **environ; bool update_process_title = true; @@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname, * apparently setproctitle() already adds a `progname:' prefix to the ps * line */ - snprintf(ps_buffer, ps_buffer_size, - %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX #else - snprintf(ps_buffer, ps_buffer_size, - postgres: %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX postgres: #endif + if (*cluster_name == '\0') + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX %s %s %s , + username, dbname, host_info); + } + else + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX [%s] %s %s %s , + cluster_name, username, dbname, host_info); + } + ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer); set_ps_display(initial_str, true); diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index be68f35..639288a 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -223,6 +223,7 @@ extern int temp_file_limit; extern int num_temp_buffers; +extern char *cluster_name; extern char *data_directory; extern char *ConfigFileName; extern char *HbaFileName; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Recursive ReceiveSharedInvalidMessages not safe
Hi, While investigating an issue pointed out by valgrind around undefined bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a bug in ReceiveSharedInvalidMessages(). It tries to be safe against recursion but it's not: When it recurses into ReceiveSharedInvalidMessages() from it's main loop from inside the inval callback while nextmsg = nummsgs it'll overwrite the 'messages' array with new contents. But at this point the old content of one entry in the messages array is still passed to the LocalExecuteInvalidationMessage() that caused the recursion. It looks to me like SHAREDINVALRELCACHE_ID messages are the only ones actually affected by this in reality because all the other ones either operate on stack memory or don't recurse. What could happen there is that a different relcache entry is invalidated than the relcache invalidation callback is called for. This actually could explain the relfilenodemap error we've seen a couple weeks back. It looks to me like this is broken since at least fad153ec. I think the fix is just to make the current 'SharedInvalidationMessage *msg' not be pointers but a local copiy of the to-be-processed entry. Comments? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Hi, On 2014-05-05 10:00:34 +, Thomas Munro wrote: When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. I wonder about that pretty regularly. To the point that I've a hacky version of this locally. So +1 for me for the idea in general. If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle postgres: [foox] ... should rather be postgres[foox]: ... imo ;) I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 15020c4..7f7fd52 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -449,6 +449,7 @@ int temp_file_limit = -1; int num_temp_buffers = 1024; +char*cluster_name; char*data_directory; char*ConfigFileName; char*HbaFileName; @@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] = }, { + {cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS, + gettext_noop(Sets the name of the cluster that appears in 'ps' output.), + NULL, + GUC_IS_NAME + }, + cluster_name, + , + NULL, NULL, NULL + }, + + { {data_directory, PGC_POSTMASTER, FILE_LOCATIONS, gettext_noop(Sets the server's data directory.), NULL, diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 6294ca3..ead7ea4 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -29,6 +29,7 @@ #include libpq/libpq.h #include miscadmin.h #include utils/ps_status.h +#include utils/guc.h extern char **environ; bool update_process_title = true; @@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname, * apparently setproctitle() already adds a `progname:' prefix to the ps * line */ - snprintf(ps_buffer, ps_buffer_size, - %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX #else - snprintf(ps_buffer, ps_buffer_size, - postgres: %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX postgres: #endif + if (*cluster_name == '\0') + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX %s %s %s , + username, dbname, host_info); + } + else + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX [%s] %s %s %s , + cluster_name, username, dbname, host_info); + } + ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer); Aren't you potentially dereferencing a NULL pointer here? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Race condition between PREPARE TRANSACTION and COMMIT PREPARED (was Re: [HACKERS] Problem with txid_snapshot_in/out() functionality)
On 04/14/2014 09:48 PM, Heikki Linnakangas wrote: On 04/14/2014 07:51 PM, Tom Lane wrote: I'd prefer to leave the prepare sequence alone and instead find a way to reject COMMIT PREPARED until after the source transaction is safely clear of the race conditions. The upthread idea of looking at vxid instead of xid might help, except that I see we clear both of them in ProcArrayClearTransaction. We'd need some state in PGPROC that isn't cleared till later than that. Hmm. What if one of the post-cleanup action fails? We can't bail out of the prepare sequence until we have transfered the locks to the new PGPROC. Otherwise the locks are lost. In essence, there should be a critical section from the EndPrepare call until all the critical cleanup actions like PostPrepare_Locks have been done, and I don't think we want that. We might be able to guarantee that the built-in post-cleanup operations are safe enough for that, but there's also CallXactCallbacks in there. Given the lack of reports of that happening, though, perhaps that's not an issue. I came up with the attached fix for this. Currently, the entry is implicitly considered dead or unlocked if the locking_xid transaction is no longer active, but this patch essentially turns locking_xid into a simple boolean, and makes it the backend's responsibility to clear it on abort. (it's not actually a boolean, it's a BackendId, but that's just for debugging purposes to track who's keeping the entry locked). This requires a process exit hook, and an abort hook, to make sure the entry is always released, but that's not too difficult. It allows the backend to release the entry at exactly the right time, instead of having it implicitly released by ProcArrayClearTransaction. If we error during prepare, after having written the prepare WAL record but before the locks have been transfered to the dummy PGPROC, the locks are simply released. This is wrong, but it's always been like that and we haven't heard any complaints of that from the field, so I'm inclined to leave it as it is. We could use a critical section to force a panic, but that cure could be a worse than the disease. I considered Andres' idea of using a new heavy-weight lock, but didn't like it much. It would be a larger patch, which is not nice for back-patching. One issue would be that if you run out of lock memory, you could not roll back any prepared transactions, which is not nice because it could be a prepared transaction that's hoarding the lock memory. - Heikki diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 66dbf58..995f51f 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -58,6 +58,7 @@ #include replication/walsender.h #include replication/syncrep.h #include storage/fd.h +#include storage/ipc.h #include storage/predicate.h #include storage/proc.h #include storage/procarray.h @@ -82,25 +83,25 @@ int max_prepared_xacts = 0; * * The lifecycle of a global transaction is: * - * 1. After checking that the requested GID is not in use, set up an - * entry in the TwoPhaseState-prepXacts array with the correct XID and GID, - * with locking_xid = my own XID and valid = false. + * 1. After checking that the requested GID is not in use, set up an entry in + * the TwoPhaseState-prepXacts array with the correct GID and valid = false, + * and mark it as locked by my backend. * * 2. After successfully completing prepare, set valid = true and enter the * referenced PGPROC into the global ProcArray. * - * 3. To begin COMMIT PREPARED or ROLLBACK PREPARED, check that the entry - * is valid and its locking_xid is no longer active, then store my current - * XID into locking_xid. This prevents concurrent attempts to commit or - * rollback the same prepared xact. + * 3. To begin COMMIT PREPARED or ROLLBACK PREPARED, check that the entry is + * valid and not locked, then mark the entry as locked by storing my current + * backend ID into locking_backend. This prevents concurrent attempts to + * commit or rollback the same prepared xact. * * 4. On completion of COMMIT PREPARED or ROLLBACK PREPARED, remove the entry * from the ProcArray and the TwoPhaseState-prepXacts array and return it to * the freelist. * * Note that if the preparing transaction fails between steps 1 and 2, the - * entry will remain in prepXacts until recycled. We can detect recyclable - * entries by checking for valid = false and locking_xid no longer active. + * entry must be removed so that the GID and the GlobalTransaction struct + * can be reused. See AtAbort_Twophase(). * * typedef struct GlobalTransactionData *GlobalTransaction appears in * twophase.h @@ -115,8 +116,8 @@ typedef struct GlobalTransactionData TimestampTz prepared_at; /* time of preparation */ XLogRecPtr prepare_lsn; /* XLOG offset of prepare record */ Oid owner; /* ID of user that executed the xact */ -
Re: [HACKERS] Cluster name in ps output
On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-05-05 10:00:34 +, Thomas Munro wrote: When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. I wonder about that pretty regularly. To the point that I've a hacky version of this locally. So +1 for me for the idea in general. Thanks! (Do you have a write up/diff somewhere showing the local modifications you're running?) If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle postgres: [foox] ... should rather be postgres[foox]: ... imo ;) Hah -- I agree, but on systems using setproctitle, the program name and : are provided already, so the end result would have to be different on those systems and I figured it should be the same everywhere if possible. (BTW I also tried to tidy up the way that is handled, so that instead of a different snprintf statement being selected by the preprocessor, a macro PROGRAM_NAME_PREFIX is defined to be empty on those systems). I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? Good idea, I will look into that. + if (*cluster_name == '\0') + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX %s %s %s , + username, dbname, host_info); + } + else + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX [%s] %s %s %s , + cluster_name, username, dbname, host_info); + } + ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer); Aren't you potentially dereferencing a NULL pointer here? Hmm -- I thought the GUC machinery would make sure cluster_name either pointed to the default I provided, an empty string, or a string read from the configuration file. Perhaps I need to go and read up on how GUCs work. Thomas Munro
Re: [HACKERS] Cluster name in ps output
On 05/05/2014 06:00 PM, Thomas Munro wrote: Hi When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. Obviously it's easy enough to find out (for example by looking at a tree view in htop/ps that shows the -D parameter of the parent postmaster), but I thought it could be useful to be able to give a name to the cluster and include it in the ps output. Here is a strawman patch that does that. If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle I'd find something like that very useful. I'd even be tempted to tack on the port number, though that might be overkill. Perhaps just use the port number instead of a new cluster name? Most people won't use/set something like a cluster name, though I guess distros that use pg_wrapper would probably auto-set it via pg_wrapper's configuration. Personally I'd find: postgres[5433]: checkpointer process at least as useful. The only time that's not unique is in a BSD jail or lxc container, and in those cases IIRC ps can show you the jail/container anyway. Showing the port would help new-ish users a lot; many seem to be very confused by which PostgreSQL instance(s) they're connecting to and which are running. Especially on Mac OS X, where people often land up with Apple's PostgreSQL, Homebrew, Postgres.app, and who knows what else running at the same time. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 10:49:19 +, Thomas Munro wrote: On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote: Hi, On 2014-05-05 10:00:34 +, Thomas Munro wrote: When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. I wonder about that pretty regularly. To the point that I've a hacky version of this locally. So +1 for me for the idea in general. Thanks! (Do you have a write up/diff somewhere showing the local modifications you're running?) I've just hacked in the port into the ps display since that was sufficient for my case. If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle postgres: [foox] ... should rather be postgres[foox]: ... imo ;) Hah -- I agree, but on systems using setproctitle, the program name and : are provided already, so the end result would have to be different on those systems and I figured it should be the same everywhere if possible. Fair point. Aren't you potentially dereferencing a NULL pointer here? Hmm -- I thought the GUC machinery would make sure cluster_name either pointed to the default I provided, an empty string, or a string read from the configuration file. Perhaps I need to go and read up on how GUCs work. That's true - but I am not sure you can guarantee it's only called after the GUC machinery has started up. Particularly on windows. I guess just initializing the global variable to should do the trick. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-05-05 10:00:34 +, Thomas Munro wrote: When running more than one cluster I often find myself looking at the output of 'iotop' or other tools wondering which cluster's wal receiver process or checkpointer process etc I'm seeing. I wonder about that pretty regularly. To the point that I've a hacky version of this locally. So +1 for me for the idea in general. Ditto. If cluster_name is not set, it defaults to the empty string and the ps output is unchanged. If it's set to 'foox' the ps output includes that string in square brackets: postgres: [foox] checkpointer process postgres: [foox] writer process postgres: [foox] wal writer process postgres: [foox] autovacuum launcher process postgres: [foox] stats collector process postgres: [foox] munro foodb [local] idle postgres: [foox] ... should rather be postgres[foox]: ... imo ;) I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? I'm not entirely sure that I see the point of having it in log_line_prefix- each cluster logs to its own log file which includes the cluster name (at least on Debian/Ubuntu and friends). The only use case I can imagine here would be for syslog, but you could just *set* the cluster name in the log_line_prefix, as it'd be (by definition..) configurable per cluster. I'd much rather see other things added as log_line_prefix options.. An interesting thought that just occured to me would be to allow any GUC to be added to log_line_prefix using some kind of extended % support (eg: '%{my_guc_here}' or something...). Would also be useful for extensions which add GUCs then? Not sure about specifics, but does seem like an interesting idea. Oh, and I know people will shoot me for bringing it up again, but I'd still like to see the CSV format be configurable ala log_line_prefix, and the same for any kind of logging (or auditing) to a table which we might eventually support. Yes, we need to work out how to do file changes when it's updated and stick a header on each new file with the columns included. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Cluster name in ps output
Craig, * Craig Ringer (cr...@2ndquadrant.com) wrote: postgres[5433]: checkpointer process at least as useful. The only time that's not unique is in a BSD jail or lxc container, and in those cases IIRC ps can show you the jail/container anyway. Uhh, that's not at all true. You can trivially have multiple IPs on a box w/o jails or containers (aliased interfaces) and then run PG on the default port- which I find to be *far* more convenient than having the same IP and a bunch of different ports. What you *can't* have is two clusters with the same name and same major version, at least on the Debian/Ubuntu distributions, and as such, I would argue to also include the major version rather than include the port, which you could get from pg_lsclusters. Showing the port would help new-ish users a lot; many seem to be very confused by which PostgreSQL instance(s) they're connecting to and which are running. Especially on Mac OS X, where people often land up with Apple's PostgreSQL, Homebrew, Postgres.app, and who knows what else running at the same time. I'm far from convinced that showing the port in the ps output will really help these users.. Not to mention that you can get that from 'netstat -anp' anyway. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 07:58:30 -0400, Stephen Frost wrote: I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? I'm not entirely sure that I see the point of having it in log_line_prefix- each cluster logs to its own log file which includes the cluster name (at least on Debian/Ubuntu and friends). The only use case I can imagine here would be for syslog, but you could just *set* the cluster name in the log_line_prefix, as it'd be (by definition..) configurable per cluster. So I've to configure it in multiple locations? I don't see the point. I usually try to configure as much in common/template config files that are included. Everything that doesn't have to be overwritten is good. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-05-05 07:58:30 -0400, Stephen Frost wrote: I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? I'm not entirely sure that I see the point of having it in log_line_prefix- each cluster logs to its own log file which includes the cluster name (at least on Debian/Ubuntu and friends). The only use case I can imagine here would be for syslog, but you could just *set* the cluster name in the log_line_prefix, as it'd be (by definition..) configurable per cluster. So I've to configure it in multiple locations? I don't see the point. I usually try to configure as much in common/template config files that are included. Everything that doesn't have to be overwritten is good. I see the point- we've already got quite a few %whatever's and adding mostly useless ones may just add confusion and unnecessary complexity. If you've already got your postgresql.conf templated through puppet/chef/salt/whatever then having to add another '%{ CLUSTER_NAME }%' or whatever shouldn't be a terribly difficult thing. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Cluster name in ps output
On 5 May 2014 10:49, Thomas Munro mu...@ip9.org wrote: On 5 May 2014 10:10, Andres Freund and...@2ndquadrant.com wrote: I guess the question is where this should be available as well. At the very least I'd want to reference it in log_line_prefix as well? Good idea, I will look into that. See v2 patch attached which lets you use %C for cluster name in the log prefix. Maybe it would be overkill, but seeing the various responses about which information belongs in the ps string, I guess we could also use a format string with %blah fields for that. Then the Debian/Ubuntu package users who tend to think of the major version + name as the complete cluster identifier could use [%V/%C] ... to get postgres: [9.3/main] ..., and others could throw in a %P to see a port number. diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 977fc66..0adfee7 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata) else appendStringInfo(buf, %lx.%x, (long) (MyStartTime), MyProcPid); break; + case 'C': +if (padding != 0) + appendStringInfo(buf, %*s, padding, cluster_name); +else + appendStringInfoString(buf, cluster_name); +break; case 'p': if (padding != 0) appendStringInfo(buf, %*d, padding, MyProcPid); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 15020c4..7f7fd52 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -449,6 +449,7 @@ int temp_file_limit = -1; int num_temp_buffers = 1024; +char *cluster_name; char *data_directory; char *ConfigFileName; char *HbaFileName; @@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] = }, { + {cluster_name, PGC_POSTMASTER, CUSTOM_OPTIONS, + gettext_noop(Sets the name of the cluster that appears in 'ps' output.), + NULL, + GUC_IS_NAME + }, + cluster_name, + , + NULL, NULL, NULL + }, + + { {data_directory, PGC_POSTMASTER, FILE_LOCATIONS, gettext_noop(Sets the server's data directory.), NULL, diff --git a/src/backend/utils/misc/ps_status.c b/src/backend/utils/misc/ps_status.c index 6294ca3..ead7ea4 100644 --- a/src/backend/utils/misc/ps_status.c +++ b/src/backend/utils/misc/ps_status.c @@ -29,6 +29,7 @@ #include libpq/libpq.h #include miscadmin.h #include utils/ps_status.h +#include utils/guc.h extern char **environ; bool update_process_title = true; @@ -264,15 +265,24 @@ init_ps_display(const char *username, const char *dbname, * apparently setproctitle() already adds a `progname:' prefix to the ps * line */ - snprintf(ps_buffer, ps_buffer_size, - %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX #else - snprintf(ps_buffer, ps_buffer_size, - postgres: %s %s %s , - username, dbname, host_info); +#define PROGRAM_NAME_PREFIX postgres: #endif + if (*cluster_name == '\0') + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX %s %s %s , + username, dbname, host_info); + } + else + { + snprintf(ps_buffer, ps_buffer_size, + PROGRAM_NAME_PREFIX [%s] %s %s %s , + cluster_name, username, dbname, host_info); + } + ps_buffer_cur_len = ps_buffer_fixed_size = strlen(ps_buffer); set_ps_display(initial_str, true); diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index be68f35..639288a 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -223,6 +223,7 @@ extern int temp_file_limit; extern int num_temp_buffers; +extern char *cluster_name; extern char *data_directory; extern char *ConfigFileName; extern char *HbaFileName; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix initdb for path with whitespace and at char
On 05/01/2014 07:55 AM, Amit Kapila wrote: On Wed, Apr 30, 2014 at 4:01 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I committed the non-invasive fixes to backbranches (and master too, just to keep it in sync), but the attached is what I came up with for master. There are a couple of places in the code where we have #ifdef WIN32 code that uses CreateProcess with CMD /C ... directly. 1. Do we need similar handling for CreatePipe case where it directly uses executable path such as in function pipe_read_line()? Currently the caller of pipe_read_line() calls canonicalize_path() to change win32 specific path, is that sufficient or do we need SYSTEMQUOTE type of handling for it. No, SYSTEMQUOTE style handling is not required with CreateProcess. find_other_exec, which is the only caller of pipe_read_line, adds one pair of double-quotes around the executable's path, which is enough for CreateProcess. 2. system.c #include assert.h Do we really need inclusion of assert.h or this is for future use? You're right, that's not needed. 3. One more observation is that currently install.bat doesn't work for such paths: install.bat e:\PostgreSQL\master\install 1\ins@1 1\ins@1== was unexpected at this time. Yeah, I noticed that. I haven't looked into what it would take to fix that, but for now you can just install to a path that doesn't contain whitespace, and move it from there. install.bat is only used by developers / packagers, so that's not a big deal. 4. Similar to Andrew, I also could not reproduce this problem on my Windows system (Windows 7 64 bit) e:\e:\PostgreSQL\master\install 1\ins@1\bin\initdb.exe -D e: \PostgreSQL\master\Data 1 e:\e:\PostgreSQL\master\install 1\ins@1\bin\pg_ctl.exe start -D e: \PostgreSQL\master\Data 1 Above commands work fine. Hmm, I'm also testing on 64-bit Windows 7, and it failed for me. Note that I already committed the narrow fix for initdb - which I also backpatched - to master, so to reproduce you'll need to revert that locally (commit 503de546). I fixed the issues with malloc that Tom pointed out and committed the wrapper functions to git master. But please let me know if there's still a problem with it. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Cascading replication and archive_command
Hi, apparently a few users were puzzled that archive_command is ignored on slave servers, which comes as a surprise since streaming replication will work fine from slaves and as far as I’ve checked the documentation also doesn’t point out the fact that archive_command gets a different treatment. Is this intentional or an oversight? Should this be fixed in the code (feature parity to SR) or in the documentation making this more explicit? best, Michael [1] http://serverfault.com/questions/514136/postgresql-archive-command-on-a-cascading-standby-server as well as personal contacts signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [HACKERS] 9.4 release notes
On Sun, May 4, 2014 at 03:49:27PM +0200, Andres Freund wrote: Hi, On 2014-05-04 08:46:07 -0400, Bruce Momjian wrote: Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Thanks for doing that work. Some comments inline: Oh, I forgot to thank committers for making this job easier every year. The commit titles are helpful, the descriptions good, and many items are marked as backward incompatible where appropriate. Git hashes make looking up commit details easy. +listitem + para + Remove system column pg_class.reltoastidxid (Michael Paquier) + /para + + para + Instead use normal index access methods. + /para +/listitem The explanation doesn't seem helpful to me. I'd just remove it as the full explanation seems to be to complex and not actually interesting. OK, description removed. I know pg_upgrade has to be modified to handle this. Hopefully others will understand how to adjust things. You are right that the full description is complex. + sect3 +titleServer/title + + itemizedlist + + listitem + para +Have VACUUM properly report dead but not removable rows to the statistics collector (Hari Babu) + /para + + para +Previously these were reported as live rows. + /para + /listitem + + listitem + para +Improve SSL renegotiation handling (Aacute;lvaro Herrera) + /para + /listitem + + listitem + para +During immediate shutdown, send uncatchable termination signals to child processes that have not already shutdown (MauMau, +Aacute;lvaro Herrera) + /para + + para +This reduces the likelihood of orphaned child processes after postmaster shutdown. + /para + /listitem + + listitem + para +Improve randomness of the database system identifier (Tom Lane) + /para + /listitem + + listitem + para +Allow background workers to be dynamically started and terminated (Robert Haas) + /para This is much more important than the previous features imo. Also, I'd add the word registered in the above list. + listitem + para +Allow dynamic allocation of shared memory segments (Robert Haas, Amit Kapila) + /para + /listitem Should also be earlier in the list. OK, added registered and I moved this item and the one above to #2 and #3. Let me know if that isn't good. I was thinking the dynamic stuff wasn't useful until we had parallelism working, but I think I was wrong. + listitem + para +Freeze tuples when tables are written with CLUSTER or VACUUM FULL (Robert Haas, Andres Freund) + /para + + para +This avoids later freezing overhead. + /para + /listitem This sentence seems a bit awkward to me. Maybe This avoids the need to freeze tuples at some later point in time. OK, I wrote, This avoids the need to freeze the tuples in the future. + listitem + para +Improve speed of accessing many sequence values (David Rowley) + /para + /listitem I think this description isn't accurate. Isn't it something like Improve speed of accesessing many different sequences in the same backend. Yes, better, thanks. + listitem + para +Use memory barriers to avoid some spinlock use (Heikki Linnakangas) + /para + /listitem This is about 1a3d104475ce01326fc00601ed66ac4d658e37e5? If so I'd put it as: Reduce spinlock contention during WAL replay. or similar. Uh, yes. I am not sure why I fixed on the memory barrier/spinlock part. I used your text and moved it to the replication section. This imo deserves to be further up this list as this is one of the biggest bottlenecks in HS node performance. I didn't move it too high up as it isn't a user-visible change, but I put it at the top of the non-visible part. Let me know if it needs further promotion. + listitem + para +Add huge_pages configuration parameter to attempt to use huge translation look-aside buffer (TLB) pages on Linux (Christian Kruse, +Richard Poole, Abhijit Menon-Sen) + /para I think this is too detailed - how about: ... to use huge pages ...? I am not sure on this as the default is try, but I updated the wording like your suggested: Add huge_pages configuration parameter to enable huge translation look-aside buffer (TLB) pages on Linux + para +This can improve performance on large memory systems. + /para + /listitem Should this be in the performance section? Well, performance is listed as General Performance. We also have index changes that help performance too but they are under Index. I think
Re: [HACKERS] 9.4 release notes
On Sun, May 4, 2014 at 06:06:52PM +0400, Oleg Bartunov wrote: Bruce, you forgot Alexander Korotkov, who contributed jsonb_hash_ops opclass for GIN. Something like Alexander Korotkov introduced an elegant jsonb_hash ops for GIN, which competes with MongoDB performance in contains operator. Here is a link to discussion - http://www.postgresql.org/message-id/53304439.8000...@dunslane.net OK, added. As I remember Alexander's name was not in the initial commit but mentioned in a later one, and I somehow missed that. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] folder:lk/lk date:yesterday..
On 04/30/2014 06:39 PM, Andres Freund wrote: Hi, Coverity flagged a couple of issues that seem to worth addressing by changing the code instead of ignoring them: 01) heap_xlog_update() looks to coverity as if it could trigger a NULL pointer dereference. That's because it thinks that oldtup.t_data is NULL if XLR_BKP_BLOCK(0) while reconstructing incremental tuples. That fortunately can't happen as incremental updates are only performed if the tuples are on the same page. Add an Assert(). 02) Be a bit more consistent in what type to use for a size variable. Inconsequential, but more consistent. 03) Don't leak memory after connection aborts in pg_recvlogical. 04) Use a sensible parameter for memset() when randomizing memory in reorderbuffer. Inconsequential. Could somebody please pick these up? Committed, thanks. I have to say, I am not particularly happy with the complexity of the control flow in heap_xlog_update() :(. Agreed :-(. It might be good to split it into two functions, one for same-page updates and another for others. And have two different WAL record structs for the cases, too. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cascading replication and archive_command
On 05/05/2014 04:19 PM, Michael Renner wrote: Hi, apparently a few users were puzzled that archive_command is ignored on slave servers, which comes as a surprise since streaming replication will work fine from slaves and as far as I’ve checked the documentation also doesn’t point out the fact that archive_command gets a different treatment. Is this intentional or an oversight? Should this be fixed in the code (feature parity to SR) or in the documentation making this more explicit? It was intentional, although I can certainly understand the viewpoint that archive_command should also archive in the standby. IIRC people argued it both ways when the cascading replication was discussed The current assumption is that the archive is shared by the master and standby (or standbys), so that there is no point in archiving the same file again in the standby. On the contrary, re-archiving the same file would fail, so you would need to disable archiving in the standby, and re-enable it when promoting, which would be more complicated. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Sun, May 4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote: On 05/04/2014 10:12 AM, Petr Jelinek wrote: On 04/05/14 14:46, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Nice work, one comment from me would be about jsonb: + listitem +para + Add structured (non-text) data type (jsonb) for storing JSON data (Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan) +/para + +para + This data type allows faster indexing and access to json keys/value pairs. +/para + /listitem I think the explanation should be expanded to say that this data type also has generic indexing not just faster indexing. It's also slightly ambiguous - it's not immediately clear that the faster also applies to the access. OK, how is this: This data type allows for faster indexing and access to json key/value pairs, as well as efficient indexing of all key/value pairs in a JSON document. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 09:10:11AM +0300, Heikki Linnakangas wrote: On 05/04/2014 03:46 PM, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. Thanks! You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Two rather large changes to the B-tree algorithms are missing: * Fix race condition in B-tree page deletion (commit efada2b8) * Make the handling of interrupted B-tree page splits more robust (commit 40dae7ec) These changes are not visible to users, but they are good toknow for beta testers. Thanks, added. I though these were fixes for 9.4-only bugs, not fixes for earlier bugs. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 09:23:15AM +0300, Heikki Linnakangas wrote: On 05/04/2014 03:46 PM, Bruce Momjian wrote: Auto-resize the catalog cache (Heikki Linnakangas) This reduces memory consumption for backends accessing only a few tables, and improves performance for backend accessing many tables. Move this to General performance section. OK, moved. Improve spinlock speed on x86_64 CPUs (test on i386?) (Heikki Linnakangas) I believe this refers to commit b03d196b. The test on i386 note can just be removed. Someone ought to test it on 32-bit i386 to see if the same change would be beneficial there, but that's a TODO item more than a release notes item. I doubt anyone's interested to spend time performance testing spinlocks on 32-bit i386, though, so I think we're going to just retain the current situation for the next decade. Agreed. Text removed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tab completion for setting search_path
--On 3. Mai 2014 10:11:33 +0200 Andres Freund and...@2ndquadrant.com wrote: diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c new file mode 100644 index 6d26ffc..dec3d4a *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** psql_completion(const char *text, int st *** 3230,3235 --- 3230,3242 COMPLETE_WITH_LIST(my_list); } + else if (pg_strcasecmp(prev2_wd, search_path) == 0) + { + COMPLETE_WITH_QUERY(Query_for_list_of_schemas +AND nspname not like 'pg\\_%%' +AND nspname not like 'information_schema' +UNION SELECT 'DEFAULT' ); + } Why should we exclude system schemata? That seems more likely to be confusing than helpful? I can see a point in excluding another backend's temp tables, but otherwise? I put my hands on this a while ago, too, but had a different notion in mind, which schema the completion should select. I came up with the following: http://git.postgresql.org/gitweb/?p=users/bernd/postgres.git;a=commitdiff;h=03fd00cd190e8b529efeec1a1bb038454fb0b05f Just complete to a schema someone has CREATE or USAGE privs. However, the reason i stopped working on it was that i really want to have a completion to a list of schemas as well and i couldn't figure a good and easy way to do this atm. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Andres Freund and...@2ndquadrant.com writes: On 2014-05-05 10:49:19 +, Thomas Munro wrote: Hah -- I agree, but on systems using setproctitle, the program name and : are provided already, so the end result would have to be different on those systems and I figured it should be the same everywhere if possible. Fair point. How about dropping the brackets, and the cluster-name concept, and just doing postgres: 5432 checkpointer process Aren't you potentially dereferencing a NULL pointer here? Hmm -- I thought the GUC machinery would make sure cluster_name either pointed to the default I provided, an empty string, or a string read from the configuration file. Perhaps I need to go and read up on how GUCs work. That's true - but I am not sure you can guarantee it's only called after the GUC machinery has started up. The elog code MUST be able to work before GUC initialization is done. What do you think will happen if we fail to open postgresql.conf, for example? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
* Tom Lane (t...@sss.pgh.pa.us) wrote: How about dropping the brackets, and the cluster-name concept, and just doing postgres: 5432 checkpointer process -1 for my part, as I'd just end up with a bunch of those and no distinction between the various processes. In other words, without a cluster distinction, it's useless. Including the value of listen_addresses along w/ the port would make it useful. If we really don't want the cluster-name concept (which, personally, I like quite a bit), how about including the listen_address value if it isn't '*'? I could see that also helping users who installed from a distro and got '127.0.0.1' and don't understand why they can't connect... Of course, these are users who can use 'ps' but not 'netstat'. Not sure how big that set really is. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Minor improvement to fdwhandler.sgml
On Wed, Apr 30, 2014 at 4:10 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: (2014/04/28 23:31), Robert Haas wrote: On Thu, Apr 24, 2014 at 7:59 AM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote: The patch attached improves docs in fdwhandler.sgml a little bit. When you submit a patch, it's helpful to describe what the patch actually does, rather than just saying it makes things better. For example, I think that this patch could be described as in fdwhandler.sgml, mark references to scan_clauses with structfield tags. I thought so. Sorry, my explanation wasn't enough. A problem with that idea is that scan_clauses is not a field in any struct. I was mistaken. I think those should be marked with literal tags. Patch attached. OK, committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: How about dropping the brackets, and the cluster-name concept, and just doing postgres: 5432 checkpointer process -1 for my part, as I'd just end up with a bunch of those and no distinction between the various processes. In other words, without a cluster distinction, it's useless. Yeah, after I sent that I got to the bit about running multiple postmasters with different IP-address bindings. I agree the port number alone wouldn't be enough in that scenario. Including the value of listen_addresses along w/ the port would make it useful. If we really don't want the cluster-name concept (which, personally, I like quite a bit), how about including the listen_address value if it isn't '*'? Nah, let's do cluster name. That way, somebody who's only got one postmaster isn't suddenly going to see a lot of useless clutter, ie the user gets to decide what to add to ps output. SHOW cluster_name might be useful at the application level as well, I suspect. I still think the brackets are unnecessary though. Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Sun, May 4, 2014 at 08:46:07AM -0400, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. I have updated output reflecting all hackers list feedback received so far: http://momjian.us/pgsql_docs/release-9-4.html (The developer copy of the docs take a while to update.) Now on to the markup additions. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
Bruce Momjian wrote: On Sun, May 4, 2014 at 03:49:27PM +0200, Andres Freund wrote: + listitem + para +Add huge_pages configuration parameter to attempt to use huge translation look-aside buffer (TLB) pages on Linux (Christian Kruse, +Richard Poole, Abhijit Menon-Sen) + /para I think this is too detailed - how about: ... to use huge pages ...? I am not sure on this as the default is try, but I updated the wording like your suggested: Add huge_pages configuration parameter to enable huge translation look-aside buffer (TLB) pages on Linux Please see http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org about the TLB huge pages terminology. Maybe ..enable huge MMU pages... as suggested by Peter G. in that sub-thread. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
* Tom Lane (t...@sss.pgh.pa.us) wrote: Stephen Frost sfr...@snowman.net writes: Including the value of listen_addresses along w/ the port would make it useful. If we really don't want the cluster-name concept (which, personally, I like quite a bit), how about including the listen_address value if it isn't '*'? Nah, let's do cluster name. That way, somebody who's only got one postmaster isn't suddenly going to see a lot of useless clutter, ie the user gets to decide what to add to ps output. SHOW cluster_name might be useful at the application level as well, I suspect. Ah, yes, agreed, that could be quite useful. I still think the brackets are unnecessary though. Either way is fine for me on this. Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. Agreed. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 10:18:44AM -0400, Alvaro Herrera wrote: Bruce Momjian wrote: On Sun, May 4, 2014 at 03:49:27PM +0200, Andres Freund wrote: + listitem + para +Add huge_pages configuration parameter to attempt to use huge translation look-aside buffer (TLB) pages on Linux (Christian Kruse, +Richard Poole, Abhijit Menon-Sen) + /para I think this is too detailed - how about: ... to use huge pages ...? I am not sure on this as the default is try, but I updated the wording like your suggested: Add huge_pages configuration parameter to enable huge translation look-aside buffer (TLB) pages on Linux Please see http://www.postgresql.org/message-id/20140226161302.gd4...@eldon.alvh.no-ip.org about the TLB huge pages terminology. Maybe ..enable huge MMU pages... as suggested by Peter G. in that sub-thread. You are totally correct. I was referencing tlb from the _old_ name for the variable, but didn't update the description when the variable was renamed. New text is: Add huge_pages configuration parameter to use huge memory pages on Linux (Christian Kruse, Richard Poole, Abhijit Menon-Sen) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja ma...@joh.to wrote: On 5/2/14, 10:10 PM, Merlin Moncure wrote: On Fri, May 2, 2014 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... That's already the case with select statements I don't think that's true if you table-qualify your column references and don't use SELECT *. and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called f? That would break backwards compatibility. How about: UPDATE foo SET ROW(foo) = (1,2,3); ISTM that this could be parsed unambiguously, though it's perhaps a bit ugly. Hm, that's a bit too ugly: row(foo) in this case means 'do special behavior X' whereas in all other cases it means make an anonymous rowtype with one attribute of type 'foo'. How about: UPDATE foo SET (foo).* = (1,2,3); merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
2014-05-05 17:02 GMT+02:00 Merlin Moncure mmonc...@gmail.com: On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja ma...@joh.to wrote: On 5/2/14, 10:10 PM, Merlin Moncure wrote: On Fri, May 2, 2014 at 3:03 PM, Tom Lane t...@sss.pgh.pa.us wrote: Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... That's already the case with select statements I don't think that's true if you table-qualify your column references and don't use SELECT *. and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called f? That would break backwards compatibility. How about: UPDATE foo SET ROW(foo) = (1,2,3); ISTM that this could be parsed unambiguously, though it's perhaps a bit ugly. Hm, that's a bit too ugly: row(foo) in this case means 'do special behavior X' whereas in all other cases it means make an anonymous rowtype with one attribute of type 'foo'. How about: UPDATE foo SET (foo).* = (1,2,3); It is looking little bit strange I like previous proposal UPDATE foo SET foo = (1,2,3); Pavel merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 05/05/2014 09:38 AM, Bruce Momjian wrote: On Sun, May 4, 2014 at 10:24:54AM -0400, Andrew Dunstan wrote: On 05/04/2014 10:12 AM, Petr Jelinek wrote: On 04/05/14 14:46, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Nice work, one comment from me would be about jsonb: + listitem +para + Add structured (non-text) data type (jsonb) for storing JSON data (Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan) +/para + +para + This data type allows faster indexing and access to json keys/value pairs. +/para + /listitem I think the explanation should be expanded to say that this data type also has generic indexing not just faster indexing. It's also slightly ambiguous - it's not immediately clear that the faster also applies to the access. OK, how is this: This data type allows for faster indexing and access to json key/value pairs, as well as efficient indexing of all key/value pairs in a JSON document. No, I think you're missing the point of what I'm saying, and I think the addition is at best misleading. I would avoid talk of key value pairs, anyway, that's not all that's in a json document (it might be an array, for example). How about: This data type allows for faster access to values in the json document and faster and more useful indexing of json. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On 05/05/2014 11:20 AM, Pavel Stehule wrote: How about: UPDATE foo SET (foo).* = (1,2,3); It is looking little bit strange I like previous proposal UPDATE foo SET foo = (1,2,3); What if the table has a field called foo? Won't it then be ambiguous? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Mon, May 5, 2014 at 10:32 AM, Andrew Dunstan and...@dunslane.net wrote: On 05/05/2014 11:20 AM, Pavel Stehule wrote: How about: UPDATE foo SET (foo).* = (1,2,3); It is looking little bit strange I like previous proposal UPDATE foo SET foo = (1,2,3); What if the table has a field called foo? Won't it then be ambiguous? See upthread: it prefers the field to the table if both are there (exactly as SELECT does). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure-2 wrote On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja lt; marko@ gt; wrote: On 5/2/14, 10:10 PM, Merlin Moncure wrote: On Fri, May 2, 2014 at 3:03 PM, Tom Lane lt; tgl@.pa gt; wrote: Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... That's already the case with select statements I don't think that's true if you table-qualify your column references and don't use SELECT *. and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called f? That would break backwards compatibility. How about: UPDATE foo SET ROW(foo) = (1,2,3); ISTM that this could be parsed unambiguously, though it's perhaps a bit ugly. Hm, that's a bit too ugly: row(foo) in this case means 'do special behavior X' whereas in all other cases it means make an anonymous rowtype with one attribute of type 'foo'. How about: UPDATE foo SET (foo).* = (1,2,3); Wouldn't UPDATE foo SET (foo.*) = (1,2,3) be better since it would cleanly support non-complete types like UPDATE foo SET (foo.col1, foo.col3) = (1,3) Though I am not that concerned about overloading the use of ROW in context of an UPDATE. As with normal usage of ROW why not make its presence optional - support both syntaxes? Keywords like USING and SET have different meanings when used in DELETE/UPDATE so having ROW behave similarly wouldn't be that confusing - and it does seem to have an ambiguity if you restrict this interpretation of ROW to only the SET of the update statement. Is there any need or requirement for (or against) interleaving normal and row-valued, or even multiple row-valued, SET expressions? UPDATE foo SET (foo.col1, foo.col3) = (1,3), foo.col2 = 2 David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Supporting-multiple-column-assignment-in-UPDATE-9-5-project-tp5802240p5802471.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 09:52:33 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: Aren't you potentially dereferencing a NULL pointer here? Hmm -- I thought the GUC machinery would make sure cluster_name either pointed to the default I provided, an empty string, or a string read from the configuration file. Perhaps I need to go and read up on how GUCs work. That's true - but I am not sure you can guarantee it's only called after the GUC machinery has started up. The elog code MUST be able to work before GUC initialization is done. What do you think will happen if we fail to open postgresql.conf, for example? But elog. shouldn't call init_ps_display(), right? Anyway, I am all for initializing it sensibly, after all it was I that suggested doing so above... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 08:03:04 -0400, Stephen Frost wrote: Craig, * Craig Ringer (cr...@2ndquadrant.com) wrote: postgres[5433]: checkpointer process at least as useful. The only time that's not unique is in a BSD jail or lxc container, and in those cases IIRC ps can show you the jail/container anyway. Uhh, that's not at all true. You can trivially have multiple IPs on a box w/o jails or containers (aliased interfaces) and then run PG on the default port- which I find to be *far* more convenient than having the same IP and a bunch of different ports. Only that you then need different socket directories. Do you really do that regularly? Anyway, I am happy having the cluster_name thingy. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 10:07:46 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: Including the value of listen_addresses along w/ the port would make it useful. If we really don't want the cluster-name concept (which, personally, I like quite a bit), how about including the listen_address value if it isn't '*'? Nah, let's do cluster name. That way, somebody who's only got one postmaster isn't suddenly going to see a lot of useless clutter, ie the user gets to decide what to add to ps output. SHOW cluster_name might be useful at the application level as well, I suspect. Hm. What about unifiyng this with event_source? Not sure if it's a good idea, but it's a pretty similar thing. Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. Using syslog and including the same config file from multiple clusters isn't that uncommon. But I can live without it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
* Andres Freund (and...@2ndquadrant.com) wrote: On 2014-05-05 08:03:04 -0400, Stephen Frost wrote: Uhh, that's not at all true. You can trivially have multiple IPs on a box w/o jails or containers (aliased interfaces) and then run PG on the default port- which I find to be *far* more convenient than having the same IP and a bunch of different ports. Only that you then need different socket directories. Do you really do that regularly? Yup. I've wished for that to be the default in Debian quite a few times, actually... Much easier to deal with firewall rules and users who are coming from pgAdmin and friends if they only have to deal with understanding what hosts they need to connect to, and not worry about the port also. Anyway, I am happy having the cluster_name thingy. Agreed. Thanks, Stpehen signature.asc Description: Digital signature
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 11:28:21AM -0400, Andrew Dunstan wrote: No, I think you're missing the point of what I'm saying, and I think the addition is at best misleading. I would avoid talk of key value pairs, anyway, that's not all that's in a json document (it might be an array, for example). How about: This data type allows for faster access to values in the json document and faster and more useful indexing of json. OK, I used you wording. You are right I didn't understand it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tab completion for setting search_path
On Sat, May 3, 2014 at 1:11 AM, Andres Freund and...@2ndquadrant.comwrote: On 2014-05-03 00:13:45 -0700, Jeff Janes wrote: On Friday, May 2, 2014, Jeff Janes jeff.ja...@gmail.com wrote: I've been working with an app that uses a schema name whose spelling is hard to type, and the lack of tab completion for SET search_path TO was bugging me. So see attached. I filter out the system schemata, but not public. That'd be nice. diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c new file mode 100644 index 6d26ffc..dec3d4a *** a/src/bin/psql/tab-complete.c --- b/src/bin/psql/tab-complete.c *** psql_completion(const char *text, int st *** 3230,3235 --- 3230,3242 COMPLETE_WITH_LIST(my_list); } + else if (pg_strcasecmp(prev2_wd, search_path) == 0) + { + COMPLETE_WITH_QUERY(Query_for_list_of_schemas + AND nspname not like 'pg\\_%%' + AND nspname not like 'information_schema' + UNION SELECT 'DEFAULT' ); + } Why should we exclude system schemata? That seems more likely to be confusing than helpful? I can see a point in excluding another backend's temp tables, but otherwise? I've personally never had a need to set the search_path to a system schema, and I guess I was implicitly modelling this on what is returned by \dn, not by \dnS. I wouldn't object much to including them; that would be better than not having any completion. I just don't see much point. And now playing a bit with the system ones, I think it would be more confusing to offer them. pg_catalog and pg_temp_appropriate always get searched, whether you put them in the search_path or not. Cheers, Jeff
[HACKERS] [PATCH] Fix use of free in walsender error handling after a sysid mismatch.
Hi, Walsender does a PQClear(con) but then accesses data acquired with PQgetvalue(). That's clearly not ok. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 1b7df22ff6da40ad4fcfa4eeacc1822373baa4e6 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 5 May 2014 18:03:44 +0200 Subject: [PATCH] Fix use of free in walsender error handling after a sysid mismatch. Found via valgrind. The bug exists since the introduction of the walsender, so backpatch to 9.0. --- src/backend/replication/libpqwalreceiver/libpqwalreceiver.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 96f31c4..88d27c7 100644 --- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c +++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c @@ -152,6 +152,7 @@ libpqrcv_identify_system(TimeLineID *primary_tli) GetSystemIdentifier()); if (strcmp(primary_sysid, standby_sysid) != 0) { + primary_sysid = pstrdup(primary_sysid); PQclear(res); ereport(ERROR, (errmsg(database system identifier differs between the primary and standby), -- 1.8.5.rc2.dirty -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
Andres Freund and...@2ndquadrant.com writes: On 2014-05-05 10:07:46 -0400, Tom Lane wrote: Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. Using syslog and including the same config file from multiple clusters isn't that uncommon. But I can live without it. So, if you are sharing a config file, how is it that you can set a per-cluster cluster_name but not a per-cluster log_line_prefix? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] regexp_replace( , , , NULL ) returns null?
On 5/2/14, 8:57 PM, Tom Lane wrote: Jim Nasby jna...@enova.com writes: ISTM it’d be a lot better if it treated NULL flags the same as ‘’... In Oracle's universe that probably makes sense, but to me it's not sensible. Why should unknown flags produce a non-unknown result? Only because they're more options than data. I find it hard to envision many use-cases where you wouldn't actually have the flags as a constant, anyway; they're too fundamental to the behavior of the function. Unless you're wrapping this function; handling the case of the flags being optional becomes easier then. (FWIW, I'm creating a version that accepts an array of search/replace arguments.) -- Jim Nasby, Lead Data Architect (512) 569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep
On 05/03/2014 01:07 AM, Andres Freund wrote: On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A -- B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK This is fundamentally borked practice. Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'd guess that B, while starting up, has written further WAL records bringing it further ahead of A. Apparently not; from what I've seen pg_stat_replication even *shows* that the replica is ahead of the master. Futher, Postgres should have recognized that there was a timeline branch point before A's last record, no? I'm working on getting permission to access the DB files. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 2014-05-05 13:07:48 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-05 10:07:46 -0400, Tom Lane wrote: Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. Using syslog and including the same config file from multiple clusters isn't that uncommon. But I can live without it. So, if you are sharing a config file, how is it that you can set a per-cluster cluster_name but not a per-cluster log_line_prefix? Well, it's a included file. With log_line_prefix support just cluster_name has to be set per cluster. Without you need string interpolation in the config management to include cluster_name in log_line_prefix. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TABLESPACE and directory for Foreign tables?
All, I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? 2) Do we want to support the TABLESPACE directive for FDWs? While cstore is the first FDW to create its own files, it won't necessarily be the last; I could imagine CSV_FDW doing so as well, or a future SQLite_FDW which does the same. So I think the above questions are worth answering in general. And we're planning to implement automated file management for cstore_fdw fairly soon, so we want to make it consistent with whatever we're doing in Postgres 9.5. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep
On 05/05/2014 10:25 AM, Andres Freund wrote: On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: On 05/03/2014 01:07 AM, Andres Freund wrote: On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A -- B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK This is fundamentally borked practice. Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'd guess that B, while starting up, has written further WAL records bringing it further ahead of A. Apparently not; from what I've seen pg_stat_replication even *shows* that the replica is ahead of the master. Futher, Postgres should have recognized that there was a timeline branch point before A's last record, no? There wasn't any timeline increase because - as far as I understand the above - there wasn't any promotion. The cluster was shut down and recovery.conf was created/removed respectively. Ah, oops, left out a step. B was promoted. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Docs for 9.4's worker_spi?
Seems there is no documentation for the 9.4 worker_spi contrib module. Is this OK? The comment at the top of the C file says: * Sample background worker code that demonstrates various coding * patterns: establishing a database connection; starting and committing * transactions; using GUC variables, and heeding SIGHUP to reread * the configuration file; reporting to pg_stat_activity; using the * process latch to sleep and exit in case of postmaster death. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep
On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: On 05/03/2014 01:07 AM, Andres Freund wrote: On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A -- B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK This is fundamentally borked practice. Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'd guess that B, while starting up, has written further WAL records bringing it further ahead of A. Apparently not; from what I've seen pg_stat_replication even *shows* that the replica is ahead of the master. Futher, Postgres should have recognized that there was a timeline branch point before A's last record, no? There wasn't any timeline increase because - as far as I understand the above - there wasn't any promotion. The cluster was shut down and recovery.conf was created/removed respectively. To me this is a operator error. We could try to defend against it more vigorously, but thats's hard to do without breaking actual usecases. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 05/04/2014 05:46 AM, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Major enhancement list: Per discussion on the -advocacy list, the features we've picked out as major enhancements for advocacy reasons this round are: * Materialized Views (because with refresh concurrently, MatViews are now useful, which they weren't, very, in 9.3) * Logical Decoding/Changeset Extraction which we're calling Data Change Streaming API in the advocacy stuff. Not sure if you want to use that name in the technical realease notes. * Dynamic Background Workers (this one is a major feature, but won't be front-listed in advocacy doc because it's hard to explain and nobody's built anything cool with it yet. But it should go under major enhancements in the release notes) * JSONB and related features (such as indexing) * ALTER SYSTEM SET Lemme know if you need description text for any of the above. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 10:40:29AM -0700, Josh Berkus wrote: On 05/04/2014 05:46 AM, Bruce Momjian wrote: I have completed the initial version of the 9.4 release notes. You can view them here: http://www.postgresql.org/docs/devel/static/release-9-4.html I will be adding additional markup in the next few days. Feedback expected and welcomed. I expect to be modifying this until we release 9.4 final. I have marked items where I need help with question marks. Major enhancement list: Per discussion on the -advocacy list, the features we've picked out as major enhancements for advocacy reasons this round are: * Materialized Views (because with refresh concurrently, MatViews are now useful, which they weren't, very, in 9.3) * Logical Decoding/Changeset Extraction which we're calling Data Change Streaming API in the advocacy stuff. Not sure if you want to use that name in the technical realease notes. * Dynamic Background Workers (this one is a major feature, but won't be front-listed in advocacy doc because it's hard to explain and nobody's built anything cool with it yet. But it should go under major enhancements in the release notes) * JSONB and related features (such as indexing) * ALTER SYSTEM SET Lemme know if you need description text for any of the above. OK, great! Once I have the markup done, I will beef up the descriptions if needed and copy the text up to the major items section so we have that all ready for beta. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
On Mon, May 5, 2014 at 2:26 PM, Josh Berkus j...@agliodbs.com wrote: All, I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? -1. Each FDW must decide. 2) Do we want to support the TABLESPACE directive for FDWs? While cstore is the first FDW to create its own files, it won't necessarily be the last; I could imagine CSV_FDW doing so as well, or a future SQLite_FDW which does the same. So I think the above questions are worth answering in general. And we're planning to implement automated file management for cstore_fdw fairly soon, so we want to make it consistent with whatever we're doing in Postgres 9.5. -1. We cannot guarantee the consistency of files using an ALTER FOREIGN TABLE ... SET TABLESPACE ... as a normal ALTER TABLE ... does. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
[HACKERS] avoiding tuple copying in btree index builds
Hi, Today, I discovered that when building a btree index, the btree code uses index_form_tuple() to create an index tuple from the heap tuple, calls tuplesort_putindextuple() to copy that tuple into the sort's memory context, and then frees the original one it built. This seemed inefficient, so I wrote a patch to eliminate the tuple copying. It works by adding a function tuplesort_putindextuplevalues(), which builds the tuple in the sort's memory context and thus avoids the need for a separate copy. I'm not sure if that's the best approach, but the optimization seems wortwhile. I tested it by repeatedly executing REINDEX INDEX pgbench_accounts_pkey on a PPC64 machine. pgbench_accounts contains 10 million records. With unpatched master as of b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s, 6.177s, and 6.201s. With the attached patch, I got times of 5.787s, 5.972s, and 5.913s, a savings of almost 5%. Not bad considering the amount of work involved. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 542ed43..0743474 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -171,28 +171,21 @@ btbuildCallback(Relation index, void *state) { BTBuildState *buildstate = (BTBuildState *) state; - IndexTuple itup; - - /* form an index tuple and point it at the heap tuple */ - itup = index_form_tuple(RelationGetDescr(index), values, isnull); - itup-t_tid = htup-t_self; /* * insert the index tuple into the appropriate spool file for subsequent * processing */ if (tupleIsAlive || buildstate-spool2 == NULL) - _bt_spool(itup, buildstate-spool); + _bt_spool(buildstate-spool, htup-t_self, values, isnull); else { /* dead tuples are put into spool2 */ buildstate-haveDead = true; - _bt_spool(itup, buildstate-spool2); + _bt_spool(buildstate-spool2, htup-t_self, values, isnull); } buildstate-indtuples += 1; - - pfree(itup); } /* diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index 9ddc275..8cd3a91 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -185,9 +185,10 @@ _bt_spooldestroy(BTSpool *btspool) * spool an index entry into the sort file. */ void -_bt_spool(IndexTuple itup, BTSpool *btspool) +_bt_spool(BTSpool *btspool, ItemPointerData self, Datum *values, bool *isnull) { - tuplesort_putindextuple(btspool-sortstate, itup); + tuplesort_putindextuplevalues(btspool-sortstate, btspool-index, + self, values, isnull); } /* diff --git a/src/backend/utils/sort/tuplesort.c b/src/backend/utils/sort/tuplesort.c index 8b520c1..3b0a06c 100644 --- a/src/backend/utils/sort/tuplesort.c +++ b/src/backend/utils/sort/tuplesort.c @@ -1150,6 +1150,29 @@ tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple) */ COPYTUP(state, stup, (void *) tuple); + MemoryContextSwitchTo(oldcontext); +} + +/* + * Collect one index tuple while collecting input data for sort, building + * it from caller-supplied values. + */ +void +tuplesort_putindextuplevalues(Tuplesortstate *state, Relation rel, + ItemPointerData self, Datum *values, + bool *isnull) +{ + MemoryContext oldcontext = MemoryContextSwitchTo(state-sortcontext); + SortTuple stup; + + stup.tuple = index_form_tuple(RelationGetDescr(rel), values, isnull); + ((IndexTuple) stup.tuple)-t_tid = self; + USEMEM(state, GetMemoryChunkSpace(stup.tuple)); + /* set up first-column key value */ + stup.datum1 = index_getattr((IndexTuple) stup.tuple, +1, +RelationGetDescr(state-indexRel), +stup.isnull1); puttuple_common(state, stup); MemoryContextSwitchTo(oldcontext); diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index 1a8b16d..5ef1f9b 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -717,7 +717,8 @@ typedef struct BTSpool BTSpool; /* opaque type known only within nbtsort.c */ extern BTSpool *_bt_spoolinit(Relation heap, Relation index, bool isunique, bool isdead); extern void _bt_spooldestroy(BTSpool *btspool); -extern void _bt_spool(IndexTuple itup, BTSpool *btspool); +extern void _bt_spool(BTSpool *btspool, ItemPointerData self, + Datum *values, bool *isnull); extern void _bt_leafbuild(BTSpool *btspool, BTSpool *spool2); /* diff --git a/src/include/utils/tuplesort.h b/src/include/utils/tuplesort.h index 05445f0..2b62a98 100644 --- a/src/include/utils/tuplesort.h +++ b/src/include/utils/tuplesort.h @@ -85,6 +85,9 @@ extern void tuplesort_puttupleslot(Tuplesortstate *state, TupleTableSlot *slot); extern void tuplesort_putheaptuple(Tuplesortstate *state, HeapTuple tup); extern void tuplesort_putindextuple(Tuplesortstate *state, IndexTuple tuple); +extern void tuplesort_putindextuplevalues(Tuplesortstate *state, +
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
Josh Berkus j...@agliodbs.com writes: I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? I think we ought to vigorously discourage FDWs from storing any files inside $PGDATA. This cannot lead to anything except grief. Just for starters, what will operations such as pg_basebackup do with them? A larger and more philosophical point is that such a direction of development could hardly be called a foreign data wrapper. People would expect Postgres to take full responsibility for such files, including data integrity considerations such as fsync-at-checkpoints and WAL support. Even if we wanted the FDW abstractions to allow for that, we're very far away from it. And frankly I'd maintain that FDW is the wrong abstraction. 2) Do we want to support the TABLESPACE directive for FDWs? A fortiori, no. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep
On 2014-05-05 10:30:17 -0700, Josh Berkus wrote: On 05/05/2014 10:25 AM, Andres Freund wrote: On 2014-05-05 10:16:27 -0700, Josh Berkus wrote: On 05/03/2014 01:07 AM, Andres Freund wrote: On 2014-05-02 18:57:08 -0700, Josh Berkus wrote: Just got a report of a replication issue with 9.2.8 from a community member: Here's the sequence: 1) A -- B (sync rep) 2) Shut down B 3) Shut down A 4) Start up B as a master 5) Start up A as sync replica of B 6) A successfully joins B as a sync replica, even though its transaction log is 1016 bytes *ahead* of B. 7) Transactions written to B all hang 8) Xlog on A is now corrupt, although the database itself is OK This is fundamentally borked practice. Now, the above sequence happened because of the user misunderstanding what sync rep really means. However, A should not have been able to connect with B in replication mode, especially in sync rep mode; that should have failed. Any thoughts on why it didn't? I'd guess that B, while starting up, has written further WAL records bringing it further ahead of A. Apparently not; from what I've seen pg_stat_replication even *shows* that the replica is ahead of the master. That's the shutdown record from A that I've talked about. Futher, Postgres should have recognized that there was a timeline branch point before A's last record, no? There wasn't any timeline increase because - as far as I understand the above - there wasn't any promotion. The cluster was shut down and recovery.conf was created/removed respectively. Ah, oops, left out a step. B was promoted. Still a user error. You need to reclone. Depending on how archiving and the target timeline was configured the timeline increase won't be treated as an error... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote: How about: This data type allows for faster access to values in the json document and faster and more useful indexing of json. We should refer to the fact that jsonb is internally typed. This isn't all that obvious now, but it is evident for example when you sort a set of raw scalar numeric jsonb values, which has a sane ordering (the implementation invokes numeric_cmp()). You also get an internal, per-number-scalar display scale, just like the numeric type proper. I'm not all that sure about how to go about succinctly expressing this, but clearly it's important. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] avoiding tuple copying in btree index builds
Hi, On 2014-05-05 13:52:39 -0400, Robert Haas wrote: Today, I discovered that when building a btree index, the btree code uses index_form_tuple() to create an index tuple from the heap tuple, calls tuplesort_putindextuple() to copy that tuple into the sort's memory context, and then frees the original one it built. This seemed inefficient, so I wrote a patch to eliminate the tuple copying. It works by adding a function tuplesort_putindextuplevalues(), which builds the tuple in the sort's memory context and thus avoids the need for a separate copy. I'm not sure if that's the best approach, but the optimization seems wortwhile. Hm. It looks like we could quite easily just get rid of tuplesort_putindextuple(). The hash usage doesn't look hard to convert. I tested it by repeatedly executing REINDEX INDEX pgbench_accounts_pkey on a PPC64 machine. pgbench_accounts contains 10 million records. With unpatched master as of b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s, 6.177s, and 6.201s. With the attached patch, I got times of 5.787s, 5.972s, and 5.913s, a savings of almost 5%. Not bad considering the amount of work involved. Yes, that's certainly worthwile. Nice. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe
Andres Freund and...@2ndquadrant.com writes: While investigating an issue pointed out by valgrind around undefined bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a bug in ReceiveSharedInvalidMessages(). It tries to be safe against recursion but it's not: When it recurses into ReceiveSharedInvalidMessages() from it's main loop from inside the inval callback while nextmsg = nummsgs it'll overwrite the 'messages' array with new contents. But at this point the old content of one entry in the messages array is still passed to the LocalExecuteInvalidationMessage() that caused the recursion. Hm, yeah, so if the called inval function continues to use the message contents after doing something that could result in a recursive call, it might be looking at trashed data. It looks to me like this is broken since at least fad153ec. I think the fix is just to make the current 'SharedInvalidationMessage *msg' not be pointers but a local copiy of the to-be-processed entry. Yeah, that should do it. I think I'd been trying to avoid copying messages more times than necessary, but evidently I optimized away one copy step too many :-( regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
On 05/05/2014 10:53 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? I think we ought to vigorously discourage FDWs from storing any files inside $PGDATA. This cannot lead to anything except grief. Just for starters, what will operations such as pg_basebackup do with them? That was one advantage to putting them in PGDATA; you get a copy of the files with pg_basebackup. Of course, they don't replicate after that, but they potentially could, in the future, with Logical Streaming Replication. A larger and more philosophical point is that such a direction of development could hardly be called a foreign data wrapper. People would expect Postgres to take full responsibility for such files, including data integrity considerations such as fsync-at-checkpoints and WAL support. Even if we wanted the FDW abstractions to allow for that, we're very far away from it. And frankly I'd maintain that FDW is the wrong abstraction. Certainly pluggable storage would be a better abstraction; but we don't have that yet. In the meantime, we have one FDW which creates files *right now*, and we might have more in the future, so I'm trying to establish some guidelines as to how such FDWs should behave. Regardless of whether or not you think FDWs should be managing files, it's better for users if all FDWs which manage files manage them in the same way. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
On Mon, May 5, 2014 at 8:17 PM, Josh Berkus j...@agliodbs.com wrote: On 05/05/2014 10:53 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? I think we ought to vigorously discourage FDWs from storing any files inside $PGDATA. This cannot lead to anything except grief. Just for starters, what will operations such as pg_basebackup do with them? That was one advantage to putting them in PGDATA; you get a copy of the files with pg_basebackup. Of course, they don't replicate after that, but they potentially could, in the future, with Logical Streaming Replication. Presumably they'd also be inconsistent? And as such not really useful unless you actually shut the database down before you back it up (e.g. don't use pg_basebackup)? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
On 2014-05-05 11:17:18 -0700, Josh Berkus wrote: On 05/05/2014 10:53 AM, Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: I'm working with the cstore_fdw project, which has an interesting property for an FDW: the FDW itself creates the files which make up the database. This raises a couple of questions: 1) Do we want to establish a standard directory for FDWs which create files, such as $PGDATA/base/{database-oid}/fdw/ ? Or would we want to leave it up to each FDW to decide? I think we ought to vigorously discourage FDWs from storing any files inside $PGDATA. This cannot lead to anything except grief. Just for starters, what will operations such as pg_basebackup do with them? That was one advantage to putting them in PGDATA; you get a copy of the files with pg_basebackup. A corrupted copy. There's no WAL replay to correct skew due to write activity while copying. Of course, they don't replicate after that, but they potentially could, in the future, with Logical Streaming Replication. Nope. They're not in the WAL, so they won't be streamed out. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote: On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote: How about: This data type allows for faster access to values in the json document and faster and more useful indexing of json. We should refer to the fact that jsonb is internally typed. This isn't all that obvious now, but it is evident for example when you sort a set of raw scalar numeric jsonb values, which has a sane ordering (the implementation invokes numeric_cmp()). You also get an internal, per-number-scalar display scale, just like the numeric type proper. I'm not all that sure about how to go about succinctly expressing this, but clearly it's important. How about: JSONB values are also mapped to SQL scalar data types, rather than being treated always as strings. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 05/05/2014 11:31 AM, Bruce Momjian wrote: JSONB values are also mapped to SQL scalar data types, rather than being treated always as strings. + ... allowing for correct sorting of JSON according to internal datums. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] avoiding tuple copying in btree index builds
On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-05 13:52:39 -0400, Robert Haas wrote: Today, I discovered that when building a btree index, the btree code uses index_form_tuple() to create an index tuple from the heap tuple, calls tuplesort_putindextuple() to copy that tuple into the sort's memory context, and then frees the original one it built. This seemed inefficient, so I wrote a patch to eliminate the tuple copying. It works by adding a function tuplesort_putindextuplevalues(), which builds the tuple in the sort's memory context and thus avoids the need for a separate copy. I'm not sure if that's the best approach, but the optimization seems wortwhile. Hm. It looks like we could quite easily just get rid of tuplesort_putindextuple(). The hash usage doesn't look hard to convert. I glanced at that, but it wasn't obvious to me how to convert the hash usage. If you have an idea, I'm all ears. I tested it by repeatedly executing REINDEX INDEX pgbench_accounts_pkey on a PPC64 machine. pgbench_accounts contains 10 million records. With unpatched master as of b2f7bd72c4d3e80065725c72e85778d5f4bdfd4a, I got times of 6.159s, 6.177s, and 6.201s. With the attached patch, I got times of 5.787s, 5.972s, and 5.913s, a savings of almost 5%. Not bad considering the amount of work involved. Yes, that's certainly worthwile. Nice. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New and interesting replication issues with 9.2.8 sync rep
On 05/05/2014 10:53 AM, Andres Freund wrote: Still a user error. You need to reclone. Depending on how archiving and the target timeline was configured the timeline increase won't be treated as an error... Andres and I hashed this out on IRC. The basic problem was that I was relying on pg_stat_replication to point out when a successful replication connection was established. However, he pointed out cases where pg_stat_replication will report sync or streaming even though replication has failed due to differences in WAL position. That appears to be what happened here. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
Josh Berkus j...@agliodbs.com writes: On 05/05/2014 10:53 AM, Tom Lane wrote: A larger and more philosophical point is that such a direction of development could hardly be called a foreign data wrapper. People would expect Postgres to take full responsibility for such files, including data integrity considerations such as fsync-at-checkpoints and WAL support. Even if we wanted the FDW abstractions to allow for that, we're very far away from it. And frankly I'd maintain that FDW is the wrong abstraction. Certainly pluggable storage would be a better abstraction; but we don't have that yet. In the meantime, we have one FDW which creates files *right now*, and we might have more in the future, so I'm trying to establish some guidelines as to how such FDWs should behave. The guideline is simple: don't do that. We should absolutely not encourage this until/unless we have infrastructure to support it. Just because one FDW author thought this would be a cool thing to do does not make it a cool thing to do, and definitely not a cool thing to encourage others to emulate. Regardless of whether or not you think FDWs should be managing files, it's better for users if all FDWs which manage files manage them in the same way. Sure. They should all keep them outside $PGDATA, making it not-our- problem. When and if we're prepared to consider it our problem, we will be sure to advise people. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 11:31 AM, Bruce Momjian br...@momjian.us wrote: JSONB values are also mapped to SQL scalar data types, rather than being treated always as strings. Something like that. Perhaps you should just go with what the documentation says: Primitive JSON types described by RFC 7159 are effectively internally mapped onto native PostgreSQL types. The fact that the default B-Tree operator class defines a type-wise ordering is beside the point. That's just currently the most obvious way in which this shadow typing is evident. At some point we're going to have to figure out ways to manipulate jsonb using shadow type specific operators or functions, so you can for example extract actual numeric values easily. I don't want users to assume that those don't exist right now due to a fundamental limitation of our implementation. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-04 13:44:17 +0200, Andres Freund wrote: postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC; key | off |size | allocated -+-+-+--- Buffer Blocks | 286242528 | 17179869184 | t Buffer Descriptors | 152024800 | 134217728 | t ... OldSerXidControlData| 17584357344 | 16 | t (44 rows) Thinking about this, I think it was a mistake to not add a 'name' field to dynamic shared memory's dsm_control_item. Right now it's very hard to figure out which extension allocated a dsm segment. Imo we should change that before 9.4 is out. I am not suggesting to use it to identify segments, but just as an identifier, passed in into dsm_create(). Imo there should be a corresponding pg_dynshmem_allocations to pg_shmem_allocations. Well, right now a dsm_control_item is 8 bytes. If we add a name field of our usual 64 bytes, they'll each be 9 times bigger. We're not talking about a lot of bytes in absolute terms, but I guess I'm not in favor of an 800% size increase without somewhat more justification than you've provided here. Who is using dynamic shared memory for enough different things at this point to get confused? I'm quite in favor of having something like this for the main shared memory segment, but I think that's 9.5 material at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
Robert Haas robertmh...@gmail.com writes: On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: Thinking about this, I think it was a mistake to not add a 'name' field to dynamic shared memory's dsm_control_item. Well, right now a dsm_control_item is 8 bytes. If we add a name field of our usual 64 bytes, they'll each be 9 times bigger. And the controlled shared segment is likely to be how big exactly? It's probably not even possible for it to be smaller than a page size, 4K or so depending on the OS. I agree with Andres that a name would be a good idea; complaining about the space needed to hold it is penny-wise and pound-foolish. I'm quite in favor of having something like this for the main shared memory segment, but I think that's 9.5 material at this point. If you're prepared to break the current APIs later to add a name parameter (which would have to be required, if it's to be useful at all), then sure, put the question off till 9.5. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On 2014-05-05 15:04:07 -0400, Robert Haas wrote: On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-04 13:44:17 +0200, Andres Freund wrote: postgres=# SELECT * FROM pg_shmem_allocations ORDER BY size DESC; key | off |size | allocated -+-+-+--- Buffer Blocks | 286242528 | 17179869184 | t Buffer Descriptors | 152024800 | 134217728 | t ... OldSerXidControlData| 17584357344 | 16 | t (44 rows) Thinking about this, I think it was a mistake to not add a 'name' field to dynamic shared memory's dsm_control_item. Right now it's very hard to figure out which extension allocated a dsm segment. Imo we should change that before 9.4 is out. I am not suggesting to use it to identify segments, but just as an identifier, passed in into dsm_create(). Imo there should be a corresponding pg_dynshmem_allocations to pg_shmem_allocations. Well, right now a dsm_control_item is 8 bytes. If we add a name field of our usual 64 bytes, they'll each be 9 times bigger. We're not talking about a lot of bytes in absolute terms, but I guess I'm not in favor of an 800% size increase without somewhat more justification than you've provided here. Who is using dynamic shared memory for enough different things at this point to get confused? The kernel side overhead of creating a shared memory segment are so much higher that this really isn't a meaningful saving. Also, are you really considering a couple hundred bytes to be a problem? I think it's quite a sensible thing for an administrator to ask where all the memory has gone. The more users for dsm there the more important that'll get. Right now pretty much the only thing a admin could do is to poke around in /proc to see which backend has mapped the segment and try to figure out via the logs what caused it to do so. Not nice. I'm quite in favor of having something like this for the main shared memory segment, but I think that's 9.5 material at this point. Clearly. For one the version I posted here missed allocations which aren't done via ShmemInitStruct (lwlock main array and hash table allocations primarily). For another it's too late ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On 2014-05-05 15:09:02 -0400, Tom Lane wrote: I'm quite in favor of having something like this for the main shared memory segment, but I think that's 9.5 material at this point. If you're prepared to break the current APIs later to add a name parameter (which would have to be required, if it's to be useful at all), then sure, put the question off till 9.5. I understood Robert to mean that it's too late for my proposed view for 9.4 - and I agree - but I wholeheartedly agree with you that we should add a name parameter to the dsm API *now*. We can just Assert() that it's nonzero if we don't think it's useful for now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe
On 2014-05-05 14:15:58 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: While investigating an issue pointed out by valgrind around undefined bytes in inval.c SHAREDINVALSMGR_ID processing I noticed that there's a bug in ReceiveSharedInvalidMessages(). It tries to be safe against recursion but it's not: When it recurses into ReceiveSharedInvalidMessages() from it's main loop from inside the inval callback while nextmsg = nummsgs it'll overwrite the 'messages' array with new contents. But at this point the old content of one entry in the messages array is still passed to the LocalExecuteInvalidationMessage() that caused the recursion. Hm, yeah, so if the called inval function continues to use the message contents after doing something that could result in a recursive call, it might be looking at trashed data. FWIW, with a bit of changes around it to make it more visible (malloc/freeing the array) this sometimes triggers during make check. So it's quite plausible that this is the caused the relfilenodemap regression error we were a bit confused about. I started to look at this code because valgrind was bleating at me inside inval.c and for the heck of I couldn't understand wh. Since then this lead me into quite a wild goose chase. Leading me to discover a couple of things I am not perfectly happy about: a) SICleanupQueue() sometimes releases and reacquires the write lock held on the outside. That's pretty damn fragile, not to mention ugly. Even slight reformulations of the code in SIInsertDataEntries() can break this... Can we please extend the comment in the latter that to mention the lock dropping explicitly? b) we right/left shift -1 in a signed int by 16 in CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's implementation defined behaviour. c) The ProcessMessageList() calls access msg-rc.id to test for the type of the existing message. That's not nice. The compiler is entirely free to e.g. copy the entire struct to local memory causing unitialized memory to be accessed. Entirely cosmetic, but ... d) The valgrind splats I was very occasionally getting were: ==21013== Conditional jump or move depends on uninitialised value(s) ==21013==at 0x8DD755: hash_search_with_hash_value (dynahash.c:885) ==21013==by 0x8DD60F: hash_search (dynahash.c:811) ==21013==by 0x799A58: smgrclosenode (smgr.c:357) ==21013==by 0x8B6ABF: LocalExecuteInvalidationMessage (inval.c:566) ==21013==by 0x77BBCF: ReceiveSharedInvalidMessages (sinval.c:127) ==21013==by 0x8B6C3F: AcceptInvalidationMessages (inval.c:640) ==21013==by 0x77FDCC: LockRelationOid (lmgr.c:107) ==21013==by 0x4A6F33: relation_open (heapam.c:1029) ==21013==by 0x4A71EB: heap_open (heapam.c:1195) ==21013==by 0x8B4228: SearchCatCache (catcache.c:1223) ==21013==by 0x8C61B1: SearchSysCache (syscache.c:909) ==21013==by 0x8C62CD: GetSysCacheOid (syscache.c:987) ==21013== Uninitialised value was created by a stack allocation ==21013==at 0x8B64C3: AddCatcacheInvalidationMessage (inval.c:328) After far, far too much confused head scratching, code reading, random elog()s et al I found out that this is just because of a deficiency in valgrind's undefinedness tracking. The problem is that it doesn't really understand shared memory sufficiently. When a backend stored a message in the SI ringbuffer it sets a bitmask about initialized memory for a specific position in the ringubffer. If the *same* backend reads from the same position in the ringbuffer (4096 messages later) into which another backend has stored a *different* type of message the bitmap of initialized bytes will possibly be wrong if the padding is distributed differently. Unfortunately this cannot precisely be caught by valgrind's suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg, 0) in AddCatcacheInvalidationMessage() et al. to suppress these warnings. Imo we can just add them unconditionally, but if somebody else prefers we can add #ifdef USE_VALGRIND around them. I can do a), c), d), if others agree but I am not sure about b)? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe
Andres Freund and...@2ndquadrant.com writes: a) SICleanupQueue() sometimes releases and reacquires the write lock held on the outside. That's pretty damn fragile, not to mention ugly. Even slight reformulations of the code in SIInsertDataEntries() can break this... Can we please extend the comment in the latter that to mention the lock dropping explicitly? Want to write a better comment? b) we right/left shift -1 in a signed int by 16 in CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's implementation defined behaviour. Looks all right to me. Yeah, the right shift might have undefined high-order bits, but we don't care because we're storing the result into an int16. c) The ProcessMessageList() calls access msg-rc.id to test for the type of the existing message. That's not nice. Huh? After far, far too much confused head scratching, code reading, random elog()s et al I found out that this is just because of a deficiency in valgrind's undefinedness tracking. [...] Unfortunately this cannot precisely be caught by valgrind's suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg, 0) in AddCatcacheInvalidationMessage() et al. to suppress these warnings. Imo we can just add them unconditionally, but if somebody else prefers we can add #ifdef USE_VALGRIND around them. I'd be okay with USE_VALGRIND. I'm not particularly hot on adding a memset for everybody just to make valgrind less confused. Especially since that's really going to hide any problems, not fix them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] doPickSplit stack buffer overflow in XLogInsert?
Hi, We really should fix this one of these days. On 2014-03-26 18:45:54 -0700, Peter Geoghegan wrote: Attached patch silences the Invalid read of size n complaints of Valgrind. I agree with your general thoughts around backpatching. Note that the patch addresses a distinct complaint from Kevin's, as Valgrind doesn't take issue with the invalid reads past the end of spgxlogPickSplit variables on the stack. I don't think that's entirely sufficient. The local spgxlogPickSplit xlrec;/spgxlogMoveLeafs xlrec; variables are also inserted while MAXLIGNing their size. That's slightly harder to fix :(. I don't have a better idea than also allocating them dynamically :( Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default
Prior to default parameters on functions, GRANT and COMMENT accepted full parameter syntax. IE: GRANT EXECUTE ON test(t text) TO public as opposed to regprocedure, which only accepts the data types ( test(text), not test(t text) ). They do not accept DEFAULT though: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; ERROR: syntax error at or near DEFAULT LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; Presumably this is just an oversight? Related to that, is it intentional that the regprocedure cast disallows *any* decorators to the function, other than type? If regprocedure at least accepted the full function parameter definition you could use it to get a definitive reference to a function. -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 1:13 PM, Andrew Dunstan and...@dunslane.net wrote: The fact that jsonb maps scalar values to internal postgres types is an implementation artefact. I wouldn't go that far, but I take your point. The fact that the primitive/scalar JSON types are in a very real sense strongly typed is a very important detail, though. The fact that what became jsonb is like a nested, typed hstore has always been prominently emphasized, for good reasons. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On 05/05/2014 02:33 PM, Josh Berkus wrote: On 05/05/2014 11:31 AM, Bruce Momjian wrote: JSONB values are also mapped to SQL scalar data types, rather than being treated always as strings. + ... allowing for correct sorting of JSON according to internal datums. The problem is that at least in one sense that's not what we're doing. The canonical ordering of object keys is not at all standard lexical ordering. I really don't think this is Release Notes material. The fact that jsonb maps scalar values to internal postgres types is an implementation artefact. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default
Jim Nasby wrote: Prior to default parameters on functions, GRANT and COMMENT accepted full parameter syntax. IE: GRANT EXECUTE ON test(t text) TO public as opposed to regprocedure, which only accepts the data types ( test(text), not test(t text) ). They do not accept DEFAULT though: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; ERROR: syntax error at or near DEFAULT LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; Presumably this is just an oversight? I have to say that accepting DEFAULT there seems pretty odd to me. What if you specify the wrong default? Do you get a no such function error? That would be pretty unhelpful. But then accepting it ignoring the fact that the default is wrong would be rather strange. Related to that, is it intentional that the regprocedure cast disallows *any* decorators to the function, other than type? If regprocedure at least accepted the full function parameter definition you could use it to get a definitive reference to a function. Does pg_identify_object() give you what you want? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe
On 2014-05-05 15:41:22 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: a) SICleanupQueue() sometimes releases and reacquires the write lock held on the outside. That's pretty damn fragile, not to mention ugly. Even slight reformulations of the code in SIInsertDataEntries() can break this... Can we please extend the comment in the latter that to mention the lock dropping explicitly? Want to write a better comment? Will do. b) we right/left shift -1 in a signed int by 16 in CacheInvalidateSmgr/LocalExecuteInvalidationMessage(). IIRC that's implementation defined behaviour. Looks all right to me. Yeah, the right shift might have undefined high-order bits, but we don't care because we're storing the result into an int16. Doesn't at the very least rnode.backend = (msg-sm.backend_hi 16) | (int) msg-sm.backend_lo; have to be rnode.backend = ((int) msg-sm.backend_hi 16) | (int) msg-sm.backend_lo; c) The ProcessMessageList() calls access msg-rc.id to test for the type of the existing message. That's not nice. Huh? The checks should access msg-id, not msg-rc.id... After far, far too much confused head scratching, code reading, random elog()s et al I found out that this is just because of a deficiency in valgrind's undefinedness tracking. [...] Unfortunately this cannot precisely be caught by valgrind's suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg, 0) in AddCatcacheInvalidationMessage() et al. to suppress these warnings. Imo we can just add them unconditionally, but if somebody else prefers we can add #ifdef USE_VALGRIND around them. I'd be okay with USE_VALGRIND. I'm not particularly hot on adding a memset for everybody just to make valgrind less confused. Especially since that's really going to hide any problems, not fix them. I can't see it having any measureable overhead. Even old compilers will optimize the memset() away and just initialize the padding... But anyway, USE_VALGRIND is fine with me. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 10:40:29AM -0700, Josh Berkus wrote: Major enhancement list: Per discussion on the -advocacy list, the features we've picked out as major enhancements for advocacy reasons this round are: * Materialized Views (because with refresh concurrently, MatViews are now useful, which they weren't, very, in 9.3) * Logical Decoding/Changeset Extraction which we're calling Data Change Streaming API in the advocacy stuff. Not sure if you want to use that name in the technical realease notes. * Dynamic Background Workers (this one is a major feature, but won't be front-listed in advocacy doc because it's hard to explain and nobody's built anything cool with it yet. But it should go under major enhancements in the release notes) * JSONB and related features (such as indexing) * ALTER SYSTEM SET Lemme know if you need description text for any of the above. OK, I added doc links and this list of major features. You can see the results here: http://momjian.us/pgsql_docs/release-9-4.html -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLESPACE and directory for Foreign tables?
On 05/05/2014 11:53 AM, Tom Lane wrote: Sure. They should all keep them outside $PGDATA, making it not-our- problem. When and if we're prepared to consider it our problem, we will be sure to advise people. OK. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
On Mon, May 5, 2014 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, May 4, 2014 at 7:50 AM, Andres Freund and...@2ndquadrant.com wrote: Thinking about this, I think it was a mistake to not add a 'name' field to dynamic shared memory's dsm_control_item. Well, right now a dsm_control_item is 8 bytes. If we add a name field of our usual 64 bytes, they'll each be 9 times bigger. And the controlled shared segment is likely to be how big exactly? It's probably not even possible for it to be smaller than a page size, 4K or so depending on the OS. I agree with Andres that a name would be a good idea; complaining about the space needed to hold it is penny-wise and pound-foolish. The control segment is sized to support a number of dynamic shared memory segments not exceeding 64 + 2 *MaxBackends. With default settings, that currently works out to 288 segments, or 2306 bytes. So, adding a 64-byte name to each of those structures would increase the size from 2k to about 20k. So, sure, that's not a lot of memory. But I'm still not convinced that's it's very useful. What I think is going to happen is that (1) most people won't be used dynamic shared memory at all, so they won't have any use for this; (2) those people who do run an extension that uses dynamic shared memory will most likely only be running one such extension, so they won't need a name to know what the segments are being used for; and (3) if and when we eventually get parallel query, dynamic shared memory segments will be widely used, but a bunch of segments that are all named parallel_query or parallel_query.$PID isn't going to be too informative. Now, all that having been said, I recognize that human-readable names are a generally useful thing, so I'm not going to hold my breath until I turn blue if other people really want this, and it may turn out to be useful someday. But if anyone is curious whether I'm *confident* that it will be useful someday: at this point, no. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Sun, May 4, 2014 at 6:49 AM, Andres Freund and...@2ndquadrant.com wrote: + listitem + para +Have pg_stat_statements use a flat file for query text storage, allowing higher limits (Peter Geoghegan) + /para + + para +Also add the ability to retrieve all pg_stat_statements information except the query text. This allows programs to reuse the query +text already retrieved by referencing queryid. + /para + /listitem This isn't an optional thing, is it? This is intended to be used by time-series monitoring tools that aggregate and graph pg_stat_statements data temporally. They usually won't need query texts, and so can only retrieve them lazily. The pg_stat_statements view presents exactly the same interface for ad-hoc querying, though. The point of the first item is that there is no longer *any* limitation on the size of stored query texts. They are no longer truncated to track_activity_query_size bytes. The shared memory overhead is also decreased substantially, allowing us to increase the default pg_stat_statements.max setting from 1,000 to 5,000, while still reducing the overall shared memory overhead (assuming a default track_activity_query_size). I think that the removal of the limitation, and the substantial lowering of the per-entry footprint should both be explicitly noted. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.4 wrap around issues
While stress testing the crash-recovery system, I keep running into wraparound shutdowns that I think should not be occurring. I go out of my way give autovac a chance to complete every now and then, more often than it should need to in order to keep up with the xid usage. I think the problem traces down to the fact that updating pg_database.datfrozenxid is WAL logged, but updating ShmemVariableCache-oldestXid is not. So a crash at just the right time means that the databases no longer think they need to be vacuumed for wrap around, but the system as a whole thinks that they do. Should crash recovery end with the system reading the recovered pg_database and recomputing ShmemVariableCache-oldestXid ? I don't know that any non-pathological case could trigger this in production. But 2^32 is not getting any larger, while maximum common database throughput and size is. I bisect this down to f9b50b7c18c8ce7de1fee59409fe2 Fix removal of files in pgstats directories. I think that before that commit, the leftover stats file was somehow tricking the system into vacuuming the databases more aggressively following a crash. Cheers, Jeff
Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe
Andres Freund and...@2ndquadrant.com writes: On 2014-05-05 15:41:22 -0400, Tom Lane wrote: Looks all right to me. Yeah, the right shift might have undefined high-order bits, but we don't care because we're storing the result into an int16. Doesn't at the very least rnode.backend = (msg-sm.backend_hi 16) | (int) msg-sm.backend_lo; have to be rnode.backend = ((int) msg-sm.backend_hi 16) | (int) msg-sm.backend_lo; A promotion to int (or wider) is implicit in any arithmetic operation, last I checked the C standard. The (int) on the other side isn't necessary either. We might actually be better off casting both sides to unsigned int, just to enforce that the left shifting is done with unsigned semantics. c) The ProcessMessageList() calls access msg-rc.id to test for the type of the existing message. That's not nice. Huh? The checks should access msg-id, not msg-rc.id... Meh. If they're not the same thing, we've got big trouble anyway. But if you want to change it as a cosmetic thing, no objection. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_shmem_allocations view
Robert Haas robertmh...@gmail.com writes: On Mon, May 5, 2014 at 3:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: And the controlled shared segment is likely to be how big exactly? It's probably not even possible for it to be smaller than a page size, 4K or so depending on the OS. I agree with Andres that a name would be a good idea; complaining about the space needed to hold it is penny-wise and pound-foolish. ... Now, all that having been said, I recognize that human-readable names are a generally useful thing, so I'm not going to hold my breath until I turn blue if other people really want this, and it may turn out to be useful someday. But if anyone is curious whether I'm *confident* that it will be useful someday: at this point, no. I'm not confident that it'll be useful either. But I am confident that if we don't put it in now, and decide we want it later, there will be complaints when we change the API. Better to have an ignored parameter than no parameter. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Cluster name in ps output
On 5 May 2014 17:22, Andres Freund and...@2ndquadrant.com wrote: On 2014-05-05 13:07:48 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2014-05-05 10:07:46 -0400, Tom Lane wrote: Also, -1 for adding another log_line_prefix escape. If you're routing multiple clusters logging to the same place (which is already a bit unlikely IMO), you can put distinguishing strings in log_line_prefix already. And it's not like we've got an infinite supply of letters for those escapes. Using syslog and including the same config file from multiple clusters isn't that uncommon. But I can live without it. So, if you are sharing a config file, how is it that you can set a per-cluster cluster_name but not a per-cluster log_line_prefix? Well, it's a included file. With log_line_prefix support just cluster_name has to be set per cluster. Without you need string interpolation in the config management to include cluster_name in log_line_prefix. Attached as cluster-name-in-ps-v3-a.patch is a new version with the following changes: * the brackets removed, as suggested by Tom Lane * static initialization of cluster_name to avoid any possibility of an uninitialized/null pointer being used before GUC initialization, as suggested by Andres Freund * cluster_name moved to config group CONN_AUTH_SETTINGS, on the basis that it's similar to bonjour_name, but it isn't really... open to suggestions for a better config_group! * a small amount of documentation in config.sgml (with cluster_name under Connection Settings, which probably isn't right either) * sample conf file updated to show cluster_name and %C in log_line_prefix A shorter version without the log_line_prefix support that Tom didn't like is attached as cluster-name-in-ps-v3-b.patch. I will try to add these to the open commitfest, and see if there is something I can usefully review in return. I verified that SHOW cluster_name works as expected and you can't change it with SET. Thanks, Thomas Munro diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 4a33f87..0fd414e 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -681,6 +681,24 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-cluster-name xreflabel=cluster_name + termvarnamecluster_name/varname (typestring/type)/term + indexterm + primaryvarnamecluster_name/ configuration parameter/primary + /indexterm + listitem + para +Sets the cluster name that appears in the process title for all +processes in this cluster. No name is shown if this parameter is set +to the empty string literal''/ (which is the default). The +process title is typically viewed by the commandps/ command, or in +Windows by using the applicationProcess Explorer/. The cluster +name can also be included in the varnamelog_line_prefix/varname. +This parameter can only be set at server start. + /para + /listitem + /varlistentry + varlistentry id=guc-tcp-keepalives-idle xreflabel=tcp_keepalives_idle termvarnametcp_keepalives_idle/varname (typeinteger/type)/term indexterm @@ -4212,6 +4230,11 @@ local0.*/var/log/postgresql /thead tbody row + entryliteral%C/literal/entry + entryCluster name/entry + entryno/entry +/row +row entryliteral%a/literal/entry entryApplication name/entry entryyes/entry diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 977fc66..0adfee7 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -2345,6 +2345,12 @@ log_line_prefix(StringInfo buf, ErrorData *edata) else appendStringInfo(buf, %lx.%x, (long) (MyStartTime), MyProcPid); break; + case 'C': +if (padding != 0) + appendStringInfo(buf, %*s, padding, cluster_name); +else + appendStringInfoString(buf, cluster_name); +break; case 'p': if (padding != 0) appendStringInfo(buf, %*d, padding, MyProcPid); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 15020c4..71897b8 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -449,6 +449,7 @@ int temp_file_limit = -1; int num_temp_buffers = 1024; +const char *cluster_name = ; char *data_directory; char *ConfigFileName; char *HbaFileName; @@ -3091,6 +3092,17 @@ static struct config_string ConfigureNamesString[] = }, { + {cluster_name, PGC_POSTMASTER, CONN_AUTH_SETTINGS, + gettext_noop(Sets the name of the cluster that appears in 'ps' output.), + NULL, + GUC_IS_NAME + }, + cluster_name, + , + NULL, NULL, NULL + }, + + { {data_directory, PGC_POSTMASTER, FILE_LOCATIONS, gettext_noop(Sets the server's data directory.),
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 02:23:13PM -0700, Peter Geoghegan wrote: On Sun, May 4, 2014 at 6:49 AM, Andres Freund and...@2ndquadrant.com wrote: + listitem + para +Have pg_stat_statements use a flat file for query text storage, allowing higher limits (Peter Geoghegan) + /para + + para +Also add the ability to retrieve all pg_stat_statements information except the query text. This allows programs to reuse the query +text already retrieved by referencing queryid. + /para + /listitem This isn't an optional thing, is it? This is intended to be used by time-series monitoring tools that aggregate and graph pg_stat_statements data temporally. They usually won't need query texts, and so can only retrieve them lazily. The pg_stat_statements view presents exactly the same interface for ad-hoc querying, though. The point of the first item is that there is no longer *any* limitation on the size of stored query texts. They are no longer truncated to track_activity_query_size bytes. The shared memory overhead is also decreased substantially, allowing us to increase the default pg_stat_statements.max setting from 1,000 to 5,000, while still reducing the overall shared memory overhead (assuming a default track_activity_query_size). I think that the removal of the limitation, and the substantial lowering of the per-entry footprint should both be explicitly noted. We rarely get into specific numers like this. It says higher limit(s) and hopefully that is enough. If you want to create a documentation 'id' I can like to that for the higher limits text. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 10:58:57AM -0700, Peter Geoghegan wrote: On Mon, May 5, 2014 at 8:28 AM, Andrew Dunstan and...@dunslane.net wrote: How about: This data type allows for faster access to values in the json document and faster and more useful indexing of json. We should refer to the fact that jsonb is internally typed. This isn't all that obvious now, but it is evident for example when you sort a set of raw scalar numeric jsonb values, which has a sane ordering (the implementation invokes numeric_cmp()). You also get an internal, per-number-scalar display scale, just like the numeric type proper. I'm not all that sure about how to go about succinctly expressing this, but clearly it's important. Current text is: Add structured (non-text) data type (JSONB) for storing JSON data (Oleg Bartunov, Teodor Sigaev, Alexander Korotkov, Peter Geoghegan, and Andrew Dunstan) This allows for faster access to values in the JSON document and faster and more useful indexing of JSON. JSONB values are also typed as appropriate scalar SQL types. Is that OK? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 4:14 PM, Bruce Momjian br...@momjian.us wrote: We rarely get into specific numers like this. It says higher limit(s) and hopefully that is enough. If you want to create a documentation 'id' I can like to that for the higher limits text. You don't have to mention a number. Just the fact that there is now *no* limit on stored query text length, which is the point. It's also nice that the per-entry shared memory overhead is much lower, which as I said I think warrants a mention in passing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.4 release notes
On Mon, May 5, 2014 at 4:16 PM, Bruce Momjian br...@momjian.us wrote: This allows for faster access to values in the JSON document and faster and more useful indexing of JSON. JSONB values are also typed as appropriate scalar SQL types. Is that OK? That seems reasonable. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers