Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks
2012/1/19 Robert Haas robertmh...@gmail.com: On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: In sepgsql side, it determines a case to apply permission checks according to the contextual information; that is same technique when we implemented create permission. Thus, it could checks db_xxx:{drop} permission correctly. Why do we need the contextual information in this case? Why can't/shouldn't the decision be made solely on the basis of what object is targeted? Several code-paths to remove a particular objects are not appropriate to apply permission checks. For example... [1] Cleanup of temporary objects on_shmem_eixt() registers RemoveTempRelationsCallback(), then it eventually calls deleteWhatDependsOn() that have an invocation of deleteOneObject(). This code path is just an internal cleanup process, not related to permission of the client. [2] Cleanup of transient table at VACUUM FULL/CLUSTER command rebuild_relation() creates a temporary table with make_new_heap(), then it copies the contents of original table according to the order of index, and calls finish_heap_swap() that swaps relation files and removes the temporary table using performDeletion(). This code actually create and drop a table, however, it is quite internal design and not related to permission of the client. So, I think sepgsql should only applied to permission checks object deletion invoked by user's operations, such as DropStmt. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Strange primary key constraint influence to grouping
Thanks for explanation. Now I remember the discussion on hackers list about this feature, but anyway, this feature surprised little bit. G.
Re: [HACKERS] Patch: Allow SQL-language functions to reference parameters by parameter name
On Wed, Jan 18, 2012 at 1:11 AM, Hitoshi Harada umi.tan...@gmail.com wrote: On Sat, Jan 14, 2012 at 8:10 AM, Matthew Draper matt...@trebex.net wrote: I just remembered to make time to advance this from WIP to proposed patch this week... and then worked out I'm rudely dropping it into the last commitfest at the last minute. :/ The patch applies clean against master but compiles with warnings. functions.c: In function ‘prepare_sql_fn_parse_info’: functions.c:212: warning: unused variable ‘argnum’ functions.c: In function ‘sql_fn_post_column_ref’: functions.c:341: warning: implicit declaration of function ‘ParseFuncOrColumn’ functions.c:345: warning: assignment makes pointer from integer without a cast (Now it occurred to me that forgetting the #include parse_func.h might hit this breakage..., so I'll fix it here and continue to test, but if you'll fix it yourself, let me know) I fixed it here and it now works with my environment. The regression tests pass, the feature seems working as aimed, but it seems to me that it needs more test cases and documentation. For the tests, I believe at least we need ambiguous case given upthread, so that we can ensure to keep compatibility. For the document, it should describe the name resolution rule, as stated in the patch comment. Aside from them, I wondered at first what if the function is schema-qualified. Say, CREATE FUNCTION s.f(a int) RETURNS int AS $$ SELECT b FROM t WHERE a = s.f.a $$ LANGUAGE sql; It actually errors out, since function-name-qualified parameter only accepts function name without schema name, but it looked weird to me at first. No better idea from me at the moment, though. I mark this Waiting on Author for now. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote: On 01/16/2012 01:28 AM, Greg Smith wrote: -I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. ...and there's at least one I missed located already: inside of md.c. I'd forgotten how many spots where timing calls are optimized out are floating around this code path. I think CF app page for this patch has missing link with wrong message-id. Thanks, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On 23.12.2011 02:01, Phil Sorber wrote: On Thu, Dec 22, 2011 at 3:19 PM, Robert Haasrobertmh...@gmail.com wrote: On Thu, Dec 22, 2011 at 2:02 PM, Phil Sorberp...@omniti.com wrote: On Thu, Dec 22, 2011 at 1:33 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: I'm wondering if we oughta just return NULL and be done with it. +1. There are multiple precedents for that sort of response, which we introduced exactly so that SELECT some_function(oid) FROM some_catalog wouldn't fail just because one of the rows had gotten deleted by the time the scan got to it. I don't think it's necessary for the relation-size functions to be any smarter. Indeed, I'd assumed that's all that Phil's patch did, since I'd not looked closer till just now. Here it is without the checking for recently dead. If it can't open the relation it simply returns NULL. I think we probably ought to make pg_database_size() and pg_tablespace_size() behave similarly. Changes added. Looks good to me, committed. I added a sentence to the docs mentioning the new behavior, and also a code comment to explain why returning NULL is better than throwing an error. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] review:tracking temporary files in pg_stat_database
Hello This is review of Tomas' patch for counting of using temporary files: * This patch was cleanly applied and code was compiled * All 128 tests passed * There are own regress tests, but usually pg_stat_* functions are not tested * There is adequate documentation * This patch was requested http://archives.postgresql.org/pgsql-hackers/2011-12/msg00950.php * Code following postgresql coding standards * Code works postgres=# create table xx(a int); CREATE TABLE postgres=# insert into xx select (random()*10)::int from generate_series(1,20); INSERT 0 20 postgres=# \d+ xx; Table public.xx Column | Type | Modifiers | Storage | Stats target | Description +-+---+-+--+- a | integer | | plain | | Has OIDs: no postgres=# \dt+ xx; List of relations Schema | Name | Type | Owner | Size | Description +--+---+---+-+- public | xx | table | pavel | 7104 kB | (1 row) postgres=# set work_mem to '1MB'; SET postgres=# select pg_stat_get_db_temp_bytes(12899),pg_stat_get_db_temp_files(12899); pg_stat_get_db_temp_bytes | pg_stat_get_db_temp_files ---+--- 9889486560 | 4103 (1 row) postgres=# select * from xx order by 1; postgres=# select pg_stat_get_db_temp_bytes(12899),pg_stat_get_db_temp_files(12899); pg_stat_get_db_temp_bytes | pg_stat_get_db_temp_files ---+--- 9892288224 | 4104 (1 row) This patch is ready for commit Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IDLE in transaction introspection
On Wed, Jan 18, 2012 at 16:35, Scott Mead sco...@openscg.com wrote: On Wed, Jan 18, 2012 at 8:27 AM, Magnus Hagander mag...@hagander.net wrote: On Tue, Jan 17, 2012 at 01:43, Scott Mead sco...@openscg.com wrote: On Mon, Jan 16, 2012 at 6:10 PM, Scott Mead sco...@openscg.com wrote: On Sun, Jan 15, 2012 at 4:28 AM, Greg Smith g...@2ndquadrant.com wrote: On 01/12/2012 11:57 AM, Scott Mead wrote: Pretty delayed, but please find the attached patch that addresses all the issues discussed. The docs on this v4 look like they suffered a patch order problem here. In the v3, you added a whole table describing the pg_stat_activity documentation in more detail than before. v4 actually tries to remove those new docs, a change which won't even apply as they don't exist upstream. My guess is you committed v3 to somewhere, applied the code changes for v4, but not the documentation ones. It's easy to do that and end up with a patch that removes a bunch of docs the previous patch added. I have to be careful to always do something like git diff origin/master to avoid this class of problem, until I got into that habit I did this sort of thing regularly. gak I did a 'backwards' diff last time. This time around, I diff-ed off of a fresh pull of 'master' (and I did the diff in the right direction. Also includes whitespace cleanup and the pg_stat_replication (procpid == pid) regression fix. I'm reviewing this again, and have changed a few things around. I came up with a question, too :-) Right now, if you turn off track activities, we put command string not enabled in the query text. Shouldn't this also be a state, such as disabled? It seems more consistent to me... Sure, I was going the route of 'client_addr', i.e. leaving it null when not in use just to keep screen clutter down, but I'm not married to it. Applied with fairly extensive modifications. I moved things around, switched to using enum instead of int+#define and a few things like that. Also changed most of the markup in the docs - I may well have broken some previously good language that, so if I did and someone spots it, please mention it :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
Excerpts from Hitoshi Harada's message of jue ene 19 07:08:52 -0300 2012: On Sun, Jan 15, 2012 at 10:46 PM, Greg Smith g...@2ndquadrant.com wrote: On 01/16/2012 01:28 AM, Greg Smith wrote: -I can't tell for sure if this is working properly when log_checkpoints is off. This now collects checkpoint end time data in all cases, whereas before it ignored that work if log_checkpoints was off. ...and there's at least one I missed located already: inside of md.c. I'd forgotten how many spots where timing calls are optimized out are floating around this code path. I think CF app page for this patch has missing link with wrong message-id. Fixed, thanks -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
On 08.01.2012 22:36, Dimitri Fontaine wrote: The extension mechanism we added in 9.1 is aimed at allowing a fully integrated contrib management, which was big enough a goal to preclude doing anything else in its first release. Hooray! Now we have it and we can think some more about what features we want covered, and a pretty obvious one that's been left out is the ability to define and update an extension without resorting to file system support for those extensions that do not need a shared object library. We could have been calling that “SQL ONLY” extensions, but to simplify the grammar support I did use the “inline” keyword so there we go. Frankly I don't see the point of this. If the extension is an independent piece of (SQL) code, developed separately from an application, with its own lifecycle, a .sql file seems like the best way to distribute it. If it's not, ie. if it's an integral part of the database schema, then why package it as an extension in the first place? Please find attached a WIP patch implementing that. Note that the main core benefit to integrating this feature is the ability to easily add regression tests for extension related features. Which is not done yet in the attached. I'm not sure I buy that argument. These inline extensions are sufficiently different from regular extensions that I think you'd need to have regression tests for both kinds, anyway. I'm sending this quite soon because of the pg_dump support. When an extension is inline, we want to dump its content, as we currently do in the binary dump output. I had in mind that we could output a full CREATE EXTENSION INLINE script in between some dollar-quoting rather than adding each extension's object with a ALTER EXTENSION ... ADD line like what pg_upgrade compatibility is currently doing. I thought the main point of extensions is that that they're not included in pg_dump. Again, if the extension is an integral part of the database, then it probably shouldn't be an extension in the first place. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS]
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On 12.01.2012 14:31, Simon Riggs wrote: In order to simulate real-world clog contention, we need to use benchmarks that deal with real world situations. Currently, pgbench pre-loads data using COPY and executes a VACUUM so that all hint bits are set on every row of every page of every table. Thus, as pgbench runs it sees zero clog accesses from historical data. As a result, clog access is minimised and the effects of clog contention in the real world go unnoticed. The following patch adds a pgbench option -I to load data using INSERTs, so that we can begin benchmark testing with rows that have large numbers of distinct un-hinted transaction ids. With a database pre-created using this we will be better able to simulate and thus more easily measure clog contention. Note that current clog has space for 1 million xids, so a scale factor of greater than 10 is required to really stress the clog. No doubt this is handy for testing this particular area, but overall I feel this is too much of a one-trick pony to include in pgbench. Alternatively, you could do something like this: do $$ declare i int4; naccounts int4; begin select count(*) into naccounts from pgbench_accounts; for i in 1..naccounts loop -- use a begin-exception block to create a new subtransaction begin update pgbench_accounts set abalance = abalance where aid = i; exception when division_by_zero then raise 'unexpected division by zero error';end; end loop; end; $$; after initializing the pgbench database, to assign distinct xmins to all rows. Another idea would be to run pg_dump in --inserts mode, edit the dump to remove BEGIN/COMMIT from it, and restore it back. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IDLE in transaction introspection
On Thu, Jan 19, 2012 at 10:31 PM, Magnus Hagander mag...@hagander.net wrote: Applied with fairly extensive modifications. I moved things around, switched to using enum instead of int+#define and a few things like that. Also changed most of the markup in the docs - I may well have broken some previously good language that, so if I did and someone spots it, please mention it :-) The attached patch seems to need to be committed. BTW, the following change in the patch is not required, but ISTM that it's better to change that way. -SELECT pg_stat_get_backend_pid(s.backendid) AS procpid, - pg_stat_get_backend_activity(s.backendid) AS current_query +SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + pg_stat_get_backend_activity(s.backendid) AS query Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center noname Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On 19 January 2012 14:36, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: No doubt this is handy for testing this particular area, but overall I feel this is too much of a one-trick pony to include in pgbench. I don't think that being conservative in accepting pgbench options is the right way to go. It's already so easy for a non-expert to shoot themselves in the foot that we don't do ourselves any favours by carefully weighing the merits of an expert-orientated feature. Have you ever read the man page for rsync? It's massive, with a huge number of options, and rsync is supposed to be a tool that's widely used by sysadmins, not a specialist database benchmarking tool. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Should I implement DROP INDEX CONCURRENTLY?
On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote: Can I just check with you that the only review comment is a one line change? Seems better to make any additional review comments in one go. No, I haven't done a full review yet - I just noticed that right at the outset and wanted to be sure that got addressed. I can look it over more carefully, and test it. Corrected your point, and another found during retest. Works as advertised, docs corrected. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml index 7177ef2..aeb1531 100644 --- a/doc/src/sgml/ref/drop_index.sgml +++ b/doc/src/sgml/ref/drop_index.sgml @@ -21,7 +21,9 @@ PostgreSQL documentation refsynopsisdiv synopsis -DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ] +DROP INDEX + [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, ...] [ CASCADE | RESTRICT ] + CONCURRENTLY replaceable class=PARAMETERname/replaceable /synopsis /refsynopsisdiv @@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] replaceable class=PARAMETERname/replaceable [, .. /varlistentry varlistentry +termliteralCONCURRENTLY/literal/term +listitem + para + When this option is used, productnamePostgreSQL/ will drop the + index without taking any locks that prevent concurrent selects, inserts, + updates, or deletes on the table; whereas a standard index drop + waits for a lock that locks out everything on the table until it's done. + Concurrent drop index is a two stage process. First, we mark the index + both invalid and not ready then commit the change. Next we wait until + there are no users locking the table who can see the index. + /para + para + There are several caveats to be aware of when using this option. + Only one index name can be specified if the literalCONCURRENTLY/literal + parameter is specified. Only one concurrent index drop can occur on a + table at a time and no modifications on the table are allowed meanwhile. + Regular commandDROP INDEX/ command can be performed within + a transaction block, but commandDROP INDEX CONCURRENTLY/ cannot. + There is no CASCADE option when dropping an index concurrently. + /para +/listitem + /varlistentry + + varlistentry termreplaceable class=PARAMETERname/replaceable/term listitem para diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 0b3d489..6884fdb 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects, DropBehavior behavior, int msglevel, const ObjectAddress *origObject); -static void deleteOneObject(const ObjectAddress *object, Relation depRel); -static void doDeletion(const ObjectAddress *object); -static void AcquireDeletionLock(const ObjectAddress *object); +static void deleteOneObject(const ObjectAddress *object, Relation depRel, + int option_flags); +static void doDeletion(const ObjectAddress *object, int option_flags); +static void AcquireDeletionLock(const ObjectAddress *object, int option_flags); static void ReleaseDeletionLock(const ObjectAddress *object); static bool find_expr_references_walker(Node *node, find_expr_references_context *context); @@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object, * Acquire deletion lock on the target object. (Ideally the caller has * done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(object); + AcquireDeletionLock(object, 0); /* * Construct a list of objects to delete (ie, the given object plus @@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object, { ObjectAddress *thisobj = targetObjects-refs + i; - deleteOneObject(thisobj, depRel); + deleteOneObject(thisobj, depRel, 0); } /* And clean up */ @@ -276,6 +277,13 @@ void performMultipleDeletions(const ObjectAddresses *objects, DropBehavior behavior) { + performMultipleDeletionsWithOptions(objects, behavior, 0); +} + +void +performMultipleDeletionsWithOptions(const ObjectAddresses *objects, + DropBehavior behavior, int option_flags) +{ Relation depRel; ObjectAddresses *targetObjects; int i; @@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects, * Acquire deletion lock on each target object. (Ideally the caller * has done this already, but many places are sloppy about it.) */ - AcquireDeletionLock(thisobj); + AcquireDeletionLock(thisobj, option_flags); findDependentObjects(thisobj, DEPFLAG_ORIGINAL, @@ -336,13 +344,17 @@
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 12.01.2012 14:31, Simon Riggs wrote: In order to simulate real-world clog contention, we need to use benchmarks that deal with real world situations. Currently, pgbench pre-loads data using COPY and executes a VACUUM so that all hint bits are set on every row of every page of every table. Thus, as pgbench runs it sees zero clog accesses from historical data. As a result, clog access is minimised and the effects of clog contention in the real world go unnoticed. The following patch adds a pgbench option -I to load data using INSERTs, so that we can begin benchmark testing with rows that have large numbers of distinct un-hinted transaction ids. With a database pre-created using this we will be better able to simulate and thus more easily measure clog contention. Note that current clog has space for 1 million xids, so a scale factor of greater than 10 is required to really stress the clog. No doubt this is handy for testing this particular area, but overall I feel this is too much of a one-trick pony to include in pgbench. Alternatively, you could do something like this: I think the one-trick pony is pgbench. It has exactly one starting condition for its tests and that isn't even a real world condition. The main point of including the option into pgbench is to have a utility that produces as initial test condition that works the same for everyone, so we can accept each others benchmark results. We both know that if someone posts that they have done $RANDOMSQL on a table before running a test, it will just be ignored and people will say user error. Some people will get it wrong when reproducing things and we'll have chaos. The patch exists as a way of testing the clog contention improvement patches and provides a route to long term regression testing that the solution(s) still work. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Frankly I don't see the point of this. If the extension is an independent piece of (SQL) code, developed separately from an application, with its own lifecycle, a .sql file seems like the best way to distribute it. If it's not, ie. if it's an integral part of the database schema, then why package it as an extension in the first place? It allows to easily deploy an extension to N databases (my current use case has 256 databases) and knowing which version is installed on each server. It's easier to QA your procedures and upgrades when they are packaged as extensions, too. Now, for the dependency on a SQL file hosting the content, it's easier to just connect to the databases and get them the script in the SQL command rather than deploying a set of files: that means OS level packaging, either RPM or debian or some other variant. Or some other means of easily deploying the files. An SQL connection is all you need if you're not shipping .so. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding shutdown checkpoint at failover
On Wed, Jan 18, 2012 at 7:15 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Nov 13, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Nov 1, 2011 at 12:11 PM, Simon Riggs si...@2ndquadrant.com wrote: When I say skip the shutdown checkpoint, I mean remove it from the critical path of required actions at the end of recovery. We can still have a normal checkpoint kicked off at that time, but that no longer needs to be on the critical path. Any problems foreseen? If not, looks like a quick patch. Patch attached for discussion/review. This feature is what I want, and very helpful to shorten the failover time in streaming replication. Here are the review comments. Though I've not checked enough whether this feature works fine in all recovery patterns yet. LocalSetXLogInsertAllowed() must be called before LogEndOfRecovery(). LocalXLogInsertAllowed must be set to -1 after LogEndOfRecovery(). XLOG_END_OF_RECOVERY record is written to the WAL file with new assigned timeline ID. But it must be written to the WAL file with old one. Otherwise, when re-entering a recovery after failover, we cannot find XLOG_END_OF_RECOVERY record at all. Before XLOG_END_OF_RECOVERY record is written, RmgrTable[rmid].rm_cleanup() might write WAL records. They also should be written to the WAL file with old timeline ID. When recovery target is specified, we cannot write new WAL to the file with old timeline because which means that valid WAL records in it are overwritten with new WAL. So when recovery target is specified, ISTM that we cannot skip end of recovery checkpoint. Or we might need to save all information about timelines in the database cluster instead of writing XLOG_END_OF_RECOVERY record, and use it when re-entering a recovery. LogEndOfRecovery() seems to need to call XLogFlush(). Otherwise, what if the server crashes after new timeline history file is created and recovery.conf is removed, but before XLOG_END_OF_RECOVERY record has not been flushed to the disk yet? During recovery, when we replay XLOG_END_OF_RECOVERY record, we should close the currently-opened WAL file and read the WAL file with the timeline which XLOG_END_OF_RECOVERY record indicates. Otherwise, when re-entering a recovery with old timeline, we cannot reach new timeline. OK, some bad things there, thanks for the insightful comments. I think you're right that we can't skip the checkpoint if xlog_cleanup writes WAL records, since that implies at least one and maybe more blocks have changed and need to be flushed. That can be improved upon, but not now in 9.2.Cleanup WAL is written in either the old or the new timeline, depending upon whether we increment it. So we don't need to change anything there, IMHO. The big problem is how we handle crash recovery after we startup without a checkpoint. No quick fixes there. So let me rethink this: The idea was that we can skip the checkpoint if we promote to normal running during streaming replication. WALReceiver has been writing to WAL files, so can write more data without all of the problems noted. Rather than write the XLOG_END_OF_RECOVERY record via XLogInsert we should write that **from the WALreceiver** as a dummy record by direct injection into the WAL stream. So the Startup process sees a WAL record that looks like it was written by the primary saying promote yourself, although it was actually written locally by WALreceiver when requested to shutdown. That doesn't damage anything because we know we've received all the WAL there is. Most importantly we don't need to change any of the logic in a way that endangers the other code paths at end of recovery. Writing the record in that way means we would need to calculate the new tli slightly earlier, so we can input the correct value into the record. That also solves the problem of how to get additional standbys to follow the new master. The XLOG_END_OF_RECOVERY record is simply the contents of the newly written tli history file. If we skip the checkpoint and then crash before the next checkpoint we just change timeline when we see XLOG_END_OF_RECOVERY. When we replay the XLOG_END_OF_RECOVERY we copy the contents to the appropriate tli file and then switch to it. So this solves 2 problems: having other standbys follow us when they don't have archiving, and avoids the checkpoint. Let me know what you think. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
Hi Mikko, First, for everyone else: this patch provides a more-compact binary output format for arrays. When the array contains no NULL elements and has a fixed-length element type, the new format saves four bytes per array element. We could do more. We could add support for arrays containing NULLs by way of a nulls bitmap. We could reduce the per-array overhead, currently 20 bytes for a 1-D array, in addition to the per-element overhead. Does anyone care about these broader cases? If so, speak now. On Thu, Dec 01, 2011 at 04:42:43PM +0200, Mikko Tiihonen wrote: On 11/30/2011 06:52 PM, Merlin Moncure wrote: On Mon, Nov 28, 2011 at 9:18 AM, Mikko Tiihonen mikko.tiiho...@nitorcreations.com wrote: Hi, As discussed few days ago it would be beneficial if we could change the v3 backend-client protocol without breaking backwards compatibility. Here is a working patch that exports a supported_binary_minor constant and a binary_minor session variable and a that can be used by clients to enable newer features. I also added an example usage where the array encoding for constant size elements is optimized if binary_minor version is new enough. I have coded the client side support for binary_minor for jdbc driver and tested that it worked. But lets first discuss if this is an acceptable path forward. I think all references to 'protocol' should be removed; Binary wire formats != protocol: the protocol could bump to v4 but the wire formats (by happenstance) could all still continue to work -- therefore the version isn't minor (it's not dependent on protocol version but only on itself). Therefore, don't much like the name 'supported_binary_minor'. How about binary_format_version? I was thinking that it would be possible use the new minor version to introduce also new protocol messages such as streaming of large data into database without knowing it's size beforehand. I agree with Merlin; the frontend/backend protocol is logically distinct from the binary send/recv formats of data types. For one key point, the latter is not exclusively core-defined; third-party extensions change their send/recv formats on a different schedule. They can add myext.binary_format_version GUCs of their own to cope in a similar way. Client interfaces that do not interpret individual result column data, such as libpq, require no update for send/recv format changes. In contrast, all client interfaces would need changes for a new protocol message. Most clients doing so may as well then advertise their support unconditionally. In contrast, send/recv format is something individual _users_ of the same client library may set differently. It's reasonable to have an application that still reads its data in send/recv format v0 even after upgrading to a version of libpq that talks frontend/backend protocol v4. I do think this is a great way to handle send/recv format changes. There needs to be decision on official policy about breaking backwards compatibility of postgresql clients. Is it: A) Every x.y postgres release is free to break any parts of the * protocol * text encoding * binary protocol as long as it is documented - all client libraries should be coded so that they refuse to support version x.y+1 or newer (they might have a option to override this but there are no guarantees that it would work) Good: no random bugs when using old client library Bad: initial complaints from users before they understand that this is the best option for everyone The ability to use old client libraries with new servers is more valuable than the combined benefits of all currently-contemplated protocol improvements. D) My proposed compromise where there is one minor_version setting and code in server to support all different minor versions. The client requests the minor version so that all releases default to backwards compatible version. This is workable. When there combinations starts to create too much maintenance overhead a new clean v4 version of protocol is specified. Good: Keeps full backwards compatibility Good: Allows introducing changes at any time Bad: Accumulates conditional code to server and clients until new version of protocol is released We would retain support for protocol V3 for years following the first release to support protocol V4, so think of the conditional code as lasting forever. The patch works as advertised. After set binary_minor = 1, the output of this command shrinks from 102 MiB to 63 MiB: \copy (select array[0,1,2,3,4,5,6,7,8,9] from generate_series(1,100)) to mynums with binary --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -477,6 +477,7 @@ static intsegment_size; static int wal_block_size; static int wal_segment_size; static bool integer_datetimes; +static int supported_binary_minor; static int
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 10:18 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 19, 2012 at 2:36 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 12.01.2012 14:31, Simon Riggs wrote: In order to simulate real-world clog contention, we need to use benchmarks that deal with real world situations. Currently, pgbench pre-loads data using COPY and executes a VACUUM so that all hint bits are set on every row of every page of every table. Thus, as pgbench runs it sees zero clog accesses from historical data. As a result, clog access is minimised and the effects of clog contention in the real world go unnoticed. The following patch adds a pgbench option -I to load data using INSERTs, so that we can begin benchmark testing with rows that have large numbers of distinct un-hinted transaction ids. With a database pre-created using this we will be better able to simulate and thus more easily measure clog contention. Note that current clog has space for 1 million xids, so a scale factor of greater than 10 is required to really stress the clog. No doubt this is handy for testing this particular area, but overall I feel this is too much of a one-trick pony to include in pgbench. Alternatively, you could do something like this: I think the one-trick pony is pgbench. It has exactly one starting condition for its tests and that isn't even a real world condition. The main point of including the option into pgbench is to have a utility that produces as initial test condition that works the same for everyone, so we can accept each others benchmark results. We both know that if someone posts that they have done $RANDOMSQL on a table before running a test, it will just be ignored and people will say user error. Some people will get it wrong when reproducing things and we'll have chaos. The patch exists as a way of testing the clog contention improvement patches and provides a route to long term regression testing that the solution(s) still work. I agree: I think this is useful. However, I think we should follow the precedent of some of the other somewhat-obscure options we've added recently and have only a long form option for this: --inserts. Also, I don't think the behavior described here should be joined at the hip to --inserts: +* We do this after a load by COPY, but before a load via INSERT +* +* This is done deliberately to ensure that no heap or index hints are +* set before we start running the benchmark. This emulates the case +* where data has arrived row at a time by INSERT, rather than being +* bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Finally, it's occurred to me that it would be useful to make pgbench respect -n even in initialization mode, and the SGML doc changes imply that this patch does that somewhere or other, but maybe only when you're doing INSERTS and not when you're doing copy, which would be odd; and there's no sgml doc update for -n, and no command-line help change either. In short, I think the reason this patch seems like it's implementing something fairly arbitrary it's really three pretty good ideas that are somewhat jumbled together. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On Mon, Jan 16, 2012 at 1:28 AM, Greg Smith g...@2ndquadrant.com wrote: One of the most useful bits of feedback on how well checkpoint I/O is going is the amount of time taken to sync files to disk. Right now the only way to get that is to parse the logs. The attached patch publishes the most useful three bits of data you could only get from log_checkpoints before out to pg_stat_bgwriter. It's not quite clear from your email, but I gather that the way that this is intended to work is that these values increment every time we checkpoint? Also, forgive for asking this possibly-stupid question, but of what use is this information? I can't imagine why I'd care about a running total of the number of files fsync'd to disk. I also can't really imagine why I'd care about the length of the write phase, which surely will almost always be a function of checkpoint_completion_target and checkpoint_timeout unless I manage to overrun the number of checkpoint_segments I've allocated. The only number that really seems useful to me is the time spent syncing. I have a clear idea what to look for there: smaller numbers are better than bigger ones. For the rest I'm mystified. And, it doesn't seem like it's necessarily going to safe me a whole lot either, because if it turns out that my sync phases are long, the first question out of my mouth is going to be what percentage of my total sync time is accounted for by the longest sync?. And so right there I'm back to the logs. It's not clear how such information could be usefully exposed in pg_stat_bgwriter either, since you probably want to know only the last few values, not a total over all time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 3:41 PM, Robert Haas robertmh...@gmail.com wrote: I agree: I think this is useful. However, I think we should follow the precedent of some of the other somewhat-obscure options we've added recently and have only a long form option for this: --inserts. Yep, no problem. Also, I don't think the behavior described here should be joined at the hip to --inserts: + * We do this after a load by COPY, but before a load via INSERT + * + * This is done deliberately to ensure that no heap or index hints are + * set before we start running the benchmark. This emulates the case + * where data has arrived row at a time by INSERT, rather than being + * bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Creating indexes later would invalidate the test conditions I was trying to create, so that doesn't add a useful new initialisation mode. (Creating the indexes causes all of the hint bits to be set). So that's just adding unrelated requirements for additional tests. Yes, there are lots of additional tests we could get this code to perform but we don't need to burden this patch with responsibility for adding them, especially not when the tests mentioned don't refer to any related patches in this commit fest and could be done at any time. Any such change is clearly very low priority at this time. Finally, it's occurred to me that it would be useful to make pgbench respect -n even in initialization mode, and the SGML doc changes imply that this patch does that somewhere or other, but maybe only when you're doing INSERTS and not when you're doing copy, which would be odd; and there's no sgml doc update for -n, and no command-line help change either. I'll fix the docs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] sepgsql's DROP Permission checks
On Thu, Jan 19, 2012 at 3:51 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: 2012/1/19 Robert Haas robertmh...@gmail.com: On Wed, Jan 18, 2012 at 9:50 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote: In sepgsql side, it determines a case to apply permission checks according to the contextual information; that is same technique when we implemented create permission. Thus, it could checks db_xxx:{drop} permission correctly. Why do we need the contextual information in this case? Why can't/shouldn't the decision be made solely on the basis of what object is targeted? Several code-paths to remove a particular objects are not appropriate to apply permission checks. For example... [1] Cleanup of temporary objects on_shmem_eixt() registers RemoveTempRelationsCallback(), then it eventually calls deleteWhatDependsOn() that have an invocation of deleteOneObject(). This code path is just an internal cleanup process, not related to permission of the client. [2] Cleanup of transient table at VACUUM FULL/CLUSTER command rebuild_relation() creates a temporary table with make_new_heap(), then it copies the contents of original table according to the order of index, and calls finish_heap_swap() that swaps relation files and removes the temporary table using performDeletion(). This code actually create and drop a table, however, it is quite internal design and not related to permission of the client. So, I think sepgsql should only applied to permission checks object deletion invoked by user's operations, such as DropStmt. I agree with that theory, but isn't this method of implementing that a pretty horrible kludge? For example, if you'd implemented it this way for 9.1, the recent drop-statement refactoring would have broken it. Or if, in the future, we add another type of statement that can drop things, this code will still compile just fine but will no longer work correctly. ISTM that we need a way to either (1) not call the hook at all unless the operation is user-initiated, or (2) call the hook, but pass a flag indicating what sort of operation this is? Let's imagine another possible use of this hook: we want to emit some kind of log message every time a database object gets dropped. I think that's a plausible use-case, and in that case what we'd want is: (1) VACUUM FULL or CLUSTER shouldn't call the hook at all, (2) cleanup of temporary objects should probably call the hook, but ideally with a flag to indicate that it's an internal (DB-initiated) operation, and (3) user activity should definitely call the hook. I'm not sure how we can cleanly get that behavior, but ISTM that's what we want... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote: Also, I don't think the behavior described here should be joined at the hip to --inserts: + * We do this after a load by COPY, but before a load via INSERT + * + * This is done deliberately to ensure that no heap or index hints are + * set before we start running the benchmark. This emulates the case + * where data has arrived row at a time by INSERT, rather than being + * bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Creating indexes later would invalidate the test conditions I was trying to create, so that doesn't add a useful new initialisation mode. (Creating the indexes causes all of the hint bits to be set). Right, but the point is that to address Heikki's objection that this is a special-purpose hack, we should try to make it general, so that it can be used by other people for other things. For example, if the options are separated, you can use this to measure how much slower --inserts vs. the regular way. But if that also changes the way indexes are created, then you can't. Moreover, since the documentation mentioned only one of those two changes and not the other, you might reasonably think that you've conducted a valid test. We could document that --inserts changes the behavior in multiple ways, but then the switch will end up being a bit of a misnomer, so I think it's better to have a separate switch for each behavior someone might want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Frankly I don't see the point of this. If the extension is an independent piece of (SQL) code, developed separately from an application, with its own lifecycle, a .sql file seems like the best way to distribute it. If it's not, ie. if it's an integral part of the database schema, then why package it as an extension in the first place? It allows to easily deploy an extension to N databases (my current use case has 256 databases) and knowing which version is installed on each server. It's easier to QA your procedures and upgrades when they are packaged as extensions, too. Now, for the dependency on a SQL file hosting the content, it's easier to just connect to the databases and get them the script in the SQL command rather than deploying a set of files: that means OS level packaging, either RPM or debian or some other variant. Or some other means of easily deploying the files. An SQL connection is all you need if you're not shipping .so. I'm with Heikki on not believing that this is a good idea. If you are trying to do careful versioning of a set of object definitions, you want to stick the things in a file, you don't want them just flying by in submitted SQL. Also, a large part of the point of the extension facility is to be able to do uninstall/reinstall and version upgrades/downgrades, none of which are possible unless the extension scripts are stored somewhere. ISTM your distribution concern would be much better addressed by installing contrib/adminpack and then just using pg_file_write() to put the new extension script into the remote server's library. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote: Also, I don't think the behavior described here should be joined at the hip to --inserts: + * We do this after a load by COPY, but before a load via INSERT + * + * This is done deliberately to ensure that no heap or index hints are + * set before we start running the benchmark. This emulates the case + * where data has arrived row at a time by INSERT, rather than being + * bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Creating indexes later would invalidate the test conditions I was trying to create, so that doesn't add a useful new initialisation mode. (Creating the indexes causes all of the hint bits to be set). Right, but the point is that to address Heikki's objection that this is a special-purpose hack, we should try to make it general, so that it can be used by other people for other things. This supports running hundreds of different tests because it creates a useful general starting condition. It's not a special purpose hack because its not a hack, nor is it special purpose. Yes, we could have an option to run with no indexes. Or we could have an option to run with 2 indexes as well. We could do all sorts of things. None of that is important, because there aren't any patches in the queue that need those tests and its too late to do it in this release. And if it really is important you can do it in the next release. If we have time to spend we should be spending it on running the patch to test the effectiveness of other patches in the queue, not on inventing new tests that have no relevance. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On Thu, Jan 19, 2012 at 3:52 PM, Robert Haas robertmh...@gmail.com wrote: And, it doesn't seem like it's necessarily going to safe me a whole lot either, because if it turns out that my sync phases are long, the first question out of my mouth is going to be what percentage of my total sync time is accounted for by the longest sync?. And so right there I'm back to the logs. It seems clear that the purpose of this is to quickly and easily decide whether there is a potential problem. If you decide there is a potential problem, you may wish to look at more detailed information. So there is a huge time saving from avoiding the need to look at the detail when its unnecessary and possibly not even enabled. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
On Jan 19, 2012, at 7:21 AM, Dimitri Fontaine wrote: Now, for the dependency on a SQL file hosting the content, it's easier to just connect to the databases and get them the script in the SQL command rather than deploying a set of files: that means OS level packaging, either RPM or debian or some other variant. Or some other means of easily deploying the files. An SQL connection is all you need if you're not shipping .so. ISTM that if you are managing 256 servers, you’re likely already using a packaging system for the deployment of application dependencies. In which case, to keep things consistent, you ought to distribute your extensions in exactly the same way. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On Wed, Jan 18, 2012 at 5:38 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, Jan 18, 2012 at 1:23 AM, Robert Haas robertmh...@gmail.com wrote: On Tue, Jan 17, 2012 at 12:37 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I found it very helpful to reduce wal_writer_delay in pgbench tests, when running with synchronous_commit=off. The reason is that hint bits don't get set until the commit record is flushed to disk, so making the flushes more frequent reduces the contention on the clog. However, Simon made async commits nudge WAL writer if the WAL page fills up, so I'm not sure how relevant that experience is anymore. Still completely relevant and orthogonal to this discussion. The patch retains multi-modal behaviour. I don't know what you mean by this. I think removing wal_writer_delay is premature, because I think it still may have some utility, and the patch removes it. That's a separate change that should be factored out of this patch and discussed separately. There's still a small but measurable effect there in master. I think we might be able to make it fully auto-tuning, but I don't think we're fully there yet (not sure how much this patch changes that equation). I suggested a design similar to the one you just proposed to Simon when he originally suggested this feature. It seems that if the WAL writer is the only one doing WAL flushes, then there must be some IPC overhead - and context switching - involved whenever WAL is flushed. But clearly we're saving something somewhere else, on the basis of Peter's results, so maybe it's not worth worrying about. It does seem pretty odd to have all the regular backends going through the WAL writer and the auxiliary processes using a different mechanism, though. If we got rid of that, maybe WAL writer wouldn't even require a lock, if there's only one process that can be doing it at a time. When we did sync rep it made sense to have the WALSender do the work and for others to just wait. It would be quite strange to require a different design for essentially the same thing for normal commits and WAL flushes to local disk. I should mention the original proposal for streaming replication had each backend send data to standby independently and that was recognised as a bad idea after some time. Same for sync rep also. I don't think those cases are directly comparable. SR is talking to another machine, and I can't imagine that there is a terribly convenient or portable way for every backend that needs one to get a hold of the file descriptor for the socket. Even if it could, the data is sent as a stream, so if multiple backends sent to the same file descriptor you'd have to have some locking to prevent messages from getting interleaved. Or else you could have multiple connections, or use UDP, but that gets rather complex. Anyway, none of this is an issue for file I/O: anybody can open the file, and if two backends write data at different offsets at the same time - or the same data at the same offset at the same time - there's no problem. So the fact that it wasn't a good idea for SR doesn't convince me that it's also a bad idea here. On the other hand, I'm not saying we *should* do it that way, either - i.e. I am not trying to require a different design just because it's fun to make people change things. Rather, I am trying to figure out whether the design you've chosen is in fact the best one, and part of that involves reasoning about why it might or might not be. There are obvious reasons to think that having process A kick process B and go to sleep, then have process B do some work and wake up process A might be less efficient than having process A just do the work itself, in the uncontended case. The fact that that isn't happening seems awfully strange to me; it's hard to understand why 3 system calls are faster than one. That suggests that either the current system is badly broken in some way that this patch fixes (in which case, it would be nice to know what the broken thing is) or that there's an opportunity for further optimization of the new patch (either now or later, depending on how much work we're talking about). Just to be clear, it makes perfect sense that the new system is faster in the contended case, and the benchmark numbers are very impressive. It's a lot harder to understand why it's not slower in the uncontended case. Not sure why its odd to have backends do one thing and auxiliaries do another. The whole point of auxiliary processes is that they do a specific thing different to normal backends. Anyway, the important thing is to have auxiliary processes be independent of each other as much as possible, which simplifies error handling and state logic in the postmaster. Yeah, I guess the shutdown sequence could get a bit complex if we try to make everyone go through the WAL writer all the time. But I wonder if we could rejiggering things somehow so that
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 18:12, Robert Haas robertmh...@gmail.com wrote: Right, but the point is that to address Heikki's objection that this is a special-purpose hack, we should try to make it general, so that it can be used by other people for other things. Personally I would like to see support for more flexibility in pgbench scripts. It would be useful to allow scripts to contain custom initialization sections -- for scripts that want a custom schema, as well as different ways to populate the standard schema. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 11:46 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 19, 2012 at 4:12 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jan 19, 2012 at 10:55 AM, Simon Riggs si...@2ndquadrant.com wrote: Also, I don't think the behavior described here should be joined at the hip to --inserts: + * We do this after a load by COPY, but before a load via INSERT + * + * This is done deliberately to ensure that no heap or index hints are + * set before we start running the benchmark. This emulates the case + * where data has arrived row at a time by INSERT, rather than being + * bulkloaded prior to update. I think that's also a useful behavior, but if we're going to have it, we should have a separate option for it, like --create-indexes-early. Otherwise, someone who wants to (for example) test only the impact of doing inserts vs. COPY will get misleading answers. Creating indexes later would invalidate the test conditions I was trying to create, so that doesn't add a useful new initialisation mode. (Creating the indexes causes all of the hint bits to be set). Right, but the point is that to address Heikki's objection that this is a special-purpose hack, we should try to make it general, so that it can be used by other people for other things. This supports running hundreds of different tests because it creates a useful general starting condition. It's not a special purpose hack because its not a hack, nor is it special purpose. Yes, we could have an option to run with no indexes. Or we could have an option to run with 2 indexes as well. We could do all sorts of things. None of that is important, because there aren't any patches in the queue that need those tests and its too late to do it in this release. And if it really is important you can do it in the next release. If we have time to spend we should be spending it on running the patch to test the effectiveness of other patches in the queue, not on inventing new tests that have no relevance. I feel I've adequate explained why it makes sense to me to separate those options. If you want, I'll do the work myself; it will take less time than arguing about it. On the other hand, if you wish to insist that we should only commit this patch if --inserts makes multiple, unrelated, undocumented changes to the initial test configurations, then I'll join Heikki in voting against this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
Greg Smith g...@2ndquadrant.com writes: One unicorn I would like to have here would give the CF app a database of recent e-mails to pgsql-hackers. I login to the CF app, click on Add recent submission, and anything matching my e-mail address appears with a checkbox next to it. Click on the patch submissions, and then something like you described would happen. That would save me the annoying work around looking up message IDs so much. Another idea: introduce some simple tag system in mails sent to -hackers to be treated specially, e.g: @fest add-to-current to add new patch to the commit fest currently in progress, or @fest add-to-next to add it to the next scheduled fest. Attribute your mail with @fest comment COMMENT TEXT or @fest comment EOF ... EOF to add a (long) comment, ditto for patch and review. How does that sound? -- Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote: I feel I've adequate explained why it makes sense to me to separate those options. If you want, I'll do the work myself; it will take less time than arguing about it. If you have time to contribute, please use the patch as stands to test the other patches in the CF queue. It's more important that we measure and fix clog contention than have a new pgbench feature with no immediate value to the next release of Postgres. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On Sun, Jan 15, 2012 at 4:17 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +1. I've been thinking we should do that for a long time, but haven't gotten around to it. I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes? I was thinking of something like this, in postgresql.conf: # - Vacuum Throttling - #vacuum_cost_page_miss = 1.0 # measured on an arbitrary scale #vacuum_cost_page_dirty = 2.0 # same scale as above #vacuum_cost_page_hit = 0.1 # same scale as above #vacuum_rate_limit = 8MB # max reads per second This is now similar to the cost settings for the planner, which is good. I have to say that I find that intensely counterintuitive. The current settings are not entirely easy to tune correctly, but at least they're easy to explain. What does that 8MB mean and how does it relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss, does that effectively also double the cost limit, so that dirty pages and hits become relatively cheaper? If so, then I think what that really means is that the limit is 8MB only if there are no hits and no dirtied pages - otherwise it's less, and the amount by which it is less is the result of some arcane calculation. Ugh! I can really imagine people wanting to limit two things here: either they want to limit the amount of read I/O, or they want to limit the amount of write I/O. If your database fits in physical memory you probably don't care about the cost of page misses very much at all, but you probably do care about how much data you dirty. OTOH, if your database doesn't fit in physical memory and you have a relatively small percentage of dirty pages because the tables are lightly updated, dirtying might be pretty secondary; if you care at all, it's going to be because busying the disk head with large sequential reads eats up too much of the system's I/O capacity. If we added vacuum_read_rate_limit and vacuum_dirty_rate_limit, totally independently of each other, and through the current system where those two things get mixed together in one big bucket out the window completely, I could maybe sign onto that as an improvement to the UI. But even then, I think we need to balance the amount of the gain against the backward compatibility problems we're going to create. If we start removing autovacuum options, then, as Greg notes, we have to figure out how to make old pg_dumps load into new databases and hopefully do something close to what the DBA intended. And the DBA will have to learn the new system. I'm not sure we're really going to get enough mileage out changing this to justify the hassle. It's basically a cosmetic improvement, and I think we should be careful about breaking compatibility for cosmetic improvements, especially at the end of a release cycle when we're under time pressure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
On 01/19/2012 12:59 PM, Alex Shulgin wrote: Greg Smithg...@2ndquadrant.com writes: One unicorn I would like to have here would give the CF app a database of recent e-mails to pgsql-hackers. I login to the CF app, click on Add recent submission, and anything matching my e-mail address appears with a checkbox next to it. Click on the patch submissions, and then something like you described would happen. That would save me the annoying work around looking up message IDs so much. Another idea: introduce some simple tag system in mails sent to -hackers to be treated specially, e.g: @fest add-to-current to add new patch to the commit fest currently in progress, or @fest add-to-next to add it to the next scheduled fest. Attribute your mail with @fest comment COMMENT TEXT or @fest commentEOF ... EOF to add a (long) comment, ditto for patch and review. How does that sound? Like a recipe for something that requires constant fixups, to be honest. Seriously, adding something to the CF isn't *that* hard. I like Greg's idea of a list of recent emails that you could choose from. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres ReviewFest Patch: URI connection string support for libpq
Nick Roosevelt nro...@gmail.com writes: I am sorry, seems like my new MUA was misconfigured so the two previous attempts to reply to -hackers@ failed. So here goes again. Just reviewed the patch for adding URI connection string support for libpg. Thank you for taking your time on this. There seem to be many tabs in the patch. Perhaps the indentation is not correct. I believe tabs for indentation is the project's standard. Also, the patch did not run correctly: patching file src/interfaces/libpq/fe-connect.c Hunk #1 succeeded at 282 with fuzz 1. Hunk #2 FAILED at 300. Hunk #3 FAILED at 583. Hunk #4 FAILED at 3742. Hunk #5 FAILED at 4132. Hunk #6 FAILED at 4279. Hunk #7 FAILED at 4455. 6 out of 7 hunks FAILED -- saving rejects to file src/interfaces/libpq/fe-connect.c.rej Seems like the file has changed since this patch was created. Please recreate the patch. I've just tried to apply the original patch against a clean checkout of master branch and it applies without a problem (no hunk failed, no fuzziness.) Could you please try again on a clean master branch? -- Regards, Alex -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simulating Clog Contention
On Thu, Jan 19, 2012 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jan 19, 2012 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote: I feel I've adequate explained why it makes sense to me to separate those options. If you want, I'll do the work myself; it will take less time than arguing about it. If you have time to contribute, please use the patch as stands to test the other patches in the CF queue. Those things aren't mutually exclusive; whether or not I spend an hour whacking this patch around isn't going to have any impact on how much benchmarking I get done. Benchmarking is mostly waiting, and I can do other things while the tests are going. Just to reiterate a point I've made previously, Nate Boley's test machine was running another big job for several weeks straight, and I haven't been able to use the machine for anything. It seems to be unloaded at the moment so I'll try to squeeze in some tests, but I don't know how long it will stay that way. It's been great to have nearly unimpeded access to this for most of the cycle, but all good things must (and do) come to an end. In any event, none of this has much impact on the offer above, which is a small amount of code that I will be happy to attend to if you do not wish to do so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
Andrew Dunstan and...@dunslane.net writes: On 01/19/2012 12:59 PM, Alex Shulgin wrote: Another idea: introduce some simple tag system in mails sent to -hackers to be treated specially, e.g: @fest add-to-current to add new patch to the commit fest currently in progress, or @fest add-to-next to add it to the next scheduled fest. Attribute your mail with @fest comment COMMENT TEXT or @fest commentEOF ... EOF to add a (long) comment, ditto for patch and review. How does that sound? Like a recipe for something that requires constant fixups, to be honest. Seriously, adding something to the CF isn't *that* hard. I like Greg's idea of a list of recent emails that you could choose from. I've just added a comment about a patch and it took me to: a. Login to commitfest app b. Locate the patch and review I was replying to c. Fetch archives thread index, refresh the index page for ~10 minutes to see my reply appear d. Copy message id and finally register comment in the commitfest app (IIRC, something close to that was already described in this thread) With the proposed approach it would only take me to include @fest comment Patch applies cleanly and possibly @fest status Needs Review to update the patch status and that'd be it. -- Alex PS: yes, I could just copy message id from the sent mail in my MUA, but I like to make sure links I post aren't broke, so still I'll need to wait until archives catches up to double check. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review of: collation for (expr)
On tor, 2012-01-12 at 21:25 -0800, probabble wrote: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 checkout. Bison upgraded to v2.5, and downgraded to v2.4.1 Make process for both versions resulted in the following errors: make[3]: Leaving directory `/usr/local/src/pgbuild/src/backend/catalog' make -C parser gram.h make[3]: Entering directory `/usr/local/src/pgbuild/src/backend/parser' /usr/local/bin/bison -d -o gram.c gram.y gram.y: conflicts: 370 reduce/reduce gram.y: expected 0 reduce/reduce conflicts gram.y:10482.27-10494.33: warning: rule useless in parser due to conflicts: func_expr: COLLATION FOR '(' a_expr ')' make[3]: *** [gram.c] Error 1 I can't reproduce that. In the meantime, attached is a re-merged version of the patch; the old version doesn't apply cleanly anymore. diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index ff9b8b0..4d77024 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -13637,6 +13637,10 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); primarypg_typeof/primary /indexterm + indexterm +primarycollation for/primary + /indexterm + para xref linkend=functions-info-catalog-table lists functions that extract information from the system catalogs. @@ -13790,6 +13794,11 @@ SELECT pg_type_is_visible('myschema.widget'::regtype); entrytyperegtype/type/entry entryget the data type of any value/entry /row + row + entryliteralfunctioncollation for (parameterany/parameter)/function/literal/entry + entrytypetext/type/entry + entryget the collation of the argument/entry + /row /tbody /tgroup /table @@ -13916,6 +13925,27 @@ SELECT typlen FROM pg_type WHERE oid = pg_typeof(33); /programlisting /para + para + The expression literalcollation for/literal returns the collation of the + value that is passed to it. Example: +programlisting +SELECT collation for (description) FROM pg_description LIMIT 1; + pg_collation_for +-- + default +(1 row) + +SELECT collation for ('foo' COLLATE de_DE); + pg_collation_for +-- + de_DE +(1 row) +/programlisting + The value might be quoted and schema-qualified. If no collation is derived + for the argument expression, then a null value is returned. If the argument + is not of a collatable data type, then an error is raised. + /para + indexterm primarycol_description/primary /indexterm diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 0ec039b..dd0e2e8 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -10466,6 +10466,19 @@ func_expr: func_name '(' ')' over_clause n-location = @1; $$ = (Node *)n; } + | COLLATION FOR '(' a_expr ')' +{ + FuncCall *n = makeNode(FuncCall); + n-funcname = SystemFuncName(pg_collation_for); + n-args = list_make1($4); + n-agg_order = NIL; + n-agg_star = FALSE; + n-agg_distinct = FALSE; + n-func_variadic = FALSE; + n-over = NULL; + n-location = @1; + $$ = (Node *)n; +} | CURRENT_DATE { /* @@ -11917,7 +11930,6 @@ unreserved_keyword: | CLASS | CLOSE | CLUSTER - | COLLATION | COMMENT | COMMENTS | COMMIT @@ -12253,6 +12265,7 @@ reserved_keyword: | CAST | CHECK | COLLATE + | COLLATION | COLUMN | CONSTRAINT | CREATE diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c index 3de6a5c..84dad33 100644 --- a/src/backend/utils/adt/misc.c +++ b/src/backend/utils/adt/misc.c @@ -26,12 +26,14 @@ #include commands/dbcommands.h #include funcapi.h #include miscadmin.h +#include nodes/nodeFuncs.h #include parser/keywords.h #include postmaster/syslogger.h #include storage/fd.h #include storage/pmsignal.h #include storage/proc.h #include storage/procarray.h +#include utils/lsyscache.h #include tcop/tcopprot.h #include utils/builtins.h #include utils/timestamp.h @@ -492,3 +494,29 @@ pg_typeof(PG_FUNCTION_ARGS) { PG_RETURN_OID(get_fn_expr_argtype(fcinfo-flinfo, 0)); } + + +/* + * Implementation of the COLLATE FOR expression; returns the collation + * of the argument. + */ +Datum +pg_collation_for(PG_FUNCTION_ARGS) +{ + Oid typeid; + Oid collid; + + typeid = get_fn_expr_argtype(fcinfo-flinfo, 0); + if (!typeid) + PG_RETURN_NULL(); + if (!type_is_collatable(typeid) typeid != UNKNOWNOID) + ereport(ERROR, +(errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg(collations are not supported by type %s, + format_type_be(typeid; + + collid = PG_GET_COLLATION(); + if (!collid) + PG_RETURN_NULL(); + PG_RETURN_TEXT_P(cstring_to_text(generate_collation_name(collid))); +} diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 9994468..e4ea0d5 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -1945,6 +1945,8 @@ DESCR(convert generic options array to
Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch n...@leadboat.com wrote: I agree with Merlin; the frontend/backend protocol is logically distinct from the binary send/recv formats of data types. For one key point, the latter is not exclusively core-defined; third-party extensions change their send/recv formats on a different schedule. They can add myext.binary_format_version GUCs of their own to cope in a similar way. I agree. It occurs to me that we recently changed the default *text* output format for bytea for reasons not dissimilar to those contemplated here. Presumably, that's a much more disruptive change, and yet we've had minimal complaints because anyone who gets bitten can easily set bytea_output='escape' and the problem goes away. The same thing seems like it would work here, only the number of people needing to change the parameter will probably be even smaller, because fewer people use binary than text. Having said that, if we're to follow the precedent set by bytea_format, maybe we ought to just add binary_array_format={huge,ittybitty} and be done with it, rather than inventing a minor protocol version GUC for something that isn't really a protocol version change at all. We could introduce a differently-named general mechanism, but I guess I'm not seeing the point of that either. Just because someone has a backward-compatibility issue with one change of this type doesn't mean they have a similar issue with all of them. So I think adding a special-purpose GUC is more logical and more parallel to what we've done in the past, and it doesn't saddle us with having to be certain that we've designed the mechanism generally enough to handle all the cases that may come later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
Excerpts from Alex Shulgin's message of jue ene 19 15:41:54 -0300 2012: PS: yes, I could just copy message id from the sent mail in my MUA, but I like to make sure links I post aren't broke, so still I'll need to wait until archives catches up to double check. I find this a bad excuse. If you're a pgsql-hackers regular, then you already know your posts are going to show up with the correct message-id. The links might be broken for the next 10 minutes, but links that stay broken for a longer period than that should be rare. Surely you don't change your MUA once a month or anything. I know I don't waste time waiting for my posts to show up in the archives before adding links to the CF app. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of patch renaming constraints
On tor, 2012-01-12 at 22:43 -0600, Joshua Berkus wrote: Compiling on Ubuntu 10.04 LTS AMD64 on a GoGrid virtual machine from 2012-01-12 git checkout. Patch applied fine. Docs are present, build, look good and are clear. Changes to gram.y required Bison 2.5 to compile. Are we requiring Bison 2.5 now? There's no configure check for it, so it took me quite a while to figure out what was wrong. I can't reproduce that. I think there might be something wrong with your installation. The same issue was reported for my COLLATION FOR patch from the same environment. Make check passed. Patch has tests for rename constraint. Most normal uses of alter table ... rename constraint ... worked normally. However, the patch does not deal correctly with constraints which are not inherited, such as primary key constraints: That appears to be because creating a primary key constraint does not set pg_constraint.conisonly correctly. This was introduced recently with noninherited check constraints. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes? Yes, the writes are more important of the two. Too many writes at one time can overflow hardware caches, so things tend to get much worse beyond a certain point. Also, rate limiting writes means we rate limit WAL rate also which is very important. I'd like this to apply to large DDL, not just VACUUMs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] automating CF submissions (was xlog location arithmetic)
On 01/19/2012 01:41 PM, Alex Shulgin wrote: With the proposed approach it would only take me to include @fest comment Patch applies cleanly and possibly @fest status Needs Review to update the patch status and that'd be it. It will be easy if you get it right. My point was that it's way too easy to get it wrong. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
Excerpts from Simon Riggs's message of jue ene 19 16:05:36 -0300 2012: On Sun, Jan 15, 2012 at 9:17 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think it makes more sense to use the max read rate as the main knob, rather than write rate. That's because the max read rate is higher than the write rate, when you don't need to dirty pages. Or do you think saturating the I/O system with writes is so much bigger a problem than read I/O that it makes more sense to emphasize the writes? Yes, the writes are more important of the two. Too many writes at one time can overflow hardware caches, so things tend to get much worse beyond a certain point. Also, rate limiting writes means we rate limit WAL rate also which is very important. I'd like this to apply to large DDL, not just VACUUMs. More generally, this can sometimes be useful in general queries as well. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WAL Restore process during recovery
On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 Fixed and better documented. When I started an archive recovery without setting restore_command, it successfully finished. Not sure exactly what you mean, but I fixed a bug that might be something you're seeing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index ce659ec..469e6d6 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -40,6 +40,7 @@ #include pgstat.h #include postmaster/bgwriter.h #include postmaster/startup.h +#include postmaster/walrestore.h #include replication/walreceiver.h #include replication/walsender.h #include storage/bufmgr.h @@ -187,7 +188,6 @@ static bool InArchiveRecovery = false; static bool restoredFromArchive = false; /* options taken from recovery.conf for archive recovery */ -static char *recoveryRestoreCommand = NULL; static char *recoveryEndCommand = NULL; static char *archiveCleanupCommand = NULL; static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET; @@ -575,8 +575,8 @@ bool reachedConsistency = false; static bool InRedo = false; -/* Have we launched bgwriter during recovery? */ -static bool bgwriterLaunched = false; +/* Have we launched background procs during archive recovery yet? */ +static bool ArchRecoveryBgProcsActive = false; /* * Information logged when we detect a change in one of the parameters @@ -632,8 +632,6 @@ static bool XLogPageRead(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt, bool randAccess); static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); -static bool RestoreArchivedFile(char *path, const char *xlogfname, - const char *recovername, off_t expectedSize); static void ExecuteRecoveryCommand(char *command, char *commandName, bool failOnerror); static void PreallocXlogFiles(XLogRecPtr endptr); @@ -2706,19 +2704,47 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, XLogFileName(xlogfname, tli, log, seg); +#define TMPRECOVERYXLOG RECOVERYXLOG + switch (source) { case XLOG_FROM_ARCHIVE: + /* + * Check to see if the WALRestore process has already put the + * next file in place while we were working. If so, use that. + * If not, get it ourselves. This makes it easier to handle + * initial state before the WALRestore is active, and also + * handles the stop/start logic correctly when we have both + * streaming and file based replication active. + * + * We queue up the next task for WALRestore after we've begun to + * use this file later in XLogFileRead(). + * + * If the WALRestore process is still active, the lock wait makes + * us wait, which is just like we were executing the command + * ourselves and so doesn't alter the logic elsewhere. + */ + if (XLogFileIsNowFullyRestored(tli, log, seg)) + { +snprintf(path, MAXPGPATH, XLOGDIR /%s, TMPRECOVERYXLOG); +restoredFromArchive = true; +break; + } + /* Report recovery progress in PS display */ snprintf(activitymsg, sizeof(activitymsg), waiting for %s, xlogfname); set_ps_display(activitymsg, false); restoredFromArchive = RestoreArchivedFile(path, xlogfname, - RECOVERYXLOG, + TMPRECOVERYXLOG, XLogSegSize); + if (!restoredFromArchive) + { +LWLockRelease(WALRestoreCommandLock); return -1; + } break; case XLOG_FROM_PG_XLOG: @@ -2748,18 +2774,42 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli, if (stat(xlogfpath, statbuf) == 0) { if (unlink(xlogfpath) != 0) + { +LWLockRelease(WALRestoreCommandLock); ereport(FATAL, (errcode_for_file_access(), errmsg(could not remove file \%s\: %m, xlogfpath))); + } reload = true; } if (rename(path, xlogfpath) 0) + { + LWLockRelease(WALRestoreCommandLock); ereport(ERROR, (errcode_for_file_access(), errmsg(could not rename file \%s\ to \%s\: %m, path, xlogfpath))); + } + + /* + * Make sure we recover from the new filename, so we can reuse the + * temporary
Re: [HACKERS] logging in high performance systems.
On Mon, Jan 16, 2012 at 3:51 AM, Greg Smith g...@2ndquadrant.com wrote: There is an important distinction to make here. Statement logging is one of the largest producers of logging data you want to worry about optimizing for on a high performance system. The decision about whether to log or not may need to be made by the hook. Something that hooks EmitErrorReport has already lost an important opportunity to make a decision about whether the system is perhaps too busy to worry about logging right now at all; you've already paid a chunk of the logging overhead getting the log message to it. I think both areas are going to be important to hook eventually. I would dismissed this out of hand at this if you said it a year ago, but I'm older and wiser now. At some point this cycle, I did some benchmarking of the subtransaction abort path, since the slowness of things like EXCEPTION blocks in PL/pgsql is a known sore point. I don't remember the exact numbers anymore, but I do remember the general picture, which is that constructing the error message is shockingly expensive compared to anything else that we do in that path. I dropped it at that point for lack of good ideas: it would be awfully nice to postpone the error message construction until we know that it's actually needed, but I don't see any clean (or even messy) way of doing that. I'm not sure if the effect is quite as significant for toplevel transaction abort, because sending the message to the client is going to cost something, so the relative cost of generating the error message on a busy system will be less. But I still wouldn't like to bet against it being significant if you're really hammering the server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Arithmetic operators for macaddr type
On Tue, Jan 17, 2012 at 12:38 AM, Fujii Masao masao.fu...@gmail.com wrote: Here is a patch for $SUBJECT. I merely added support for ~, and | operators for the macaddr type. The patch itself is rather trivial, and includes regression tests and a doc update. The patch looks fine except that it uses the OIDs already used in pg_proc.h. Attached is the updated version of the patch, which fixes the above problem. That same problem came back into existence, so I fixed it again, added a catversion bump, and committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On 22.11.2011 21:38, Alexander Korotkov wrote: WIP patch with index support for regexp search for pg_trgm contrib is attached. In spite of techniques which extracts continuous text parts from regexp, this patch presents technique of automatum transformation. That allows more comprehensive trigrams extraction. Nice! Current version of patch have some limitations: 1) Algorithm of logical expression extraction on trigrams have high computational complexity. So, it can become really slow on regexp with many branches. Probably, improvements of this algorithm is possible. 2) Surely, no perfomance benefit if no trigrams can be extracted from regexp. It's inevitably. 3) Currently, only GIN index is supported. There are no serious problems, GiST code for it just not written yet. 4) It appear to be some kind of problem to extract multibyte encoded character from pg_wchar. I've posted question about it here: http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php While I've hardcoded some dirty solution. So PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not supported yet. This is pretty far from being in committable state, so I'm going to mark this as returned with feedback in the commitfest app. The feedback: The code badly needs comments. There is no explanation of how the trigram extraction code in trgm_regexp.c works. Guessing from the variable names, it seems to be some sort of a coloring algorithm that works on a graph, but that all needs to be explained. Can this algorithm be found somewhere in literature, perhaps? A link to a paper would be nice. Apart from that, the multibyte issue seems like the big one. Any way around that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
Tom Lane t...@sss.pgh.pa.us writes: I'm with Heikki on not believing that this is a good idea. If you are trying to do careful versioning of a set of object definitions, you want to stick the things in a file, you don't want them just flying by in submitted SQL. I'm trying to open the extension facilities (versions being the first of them, think \dx) to application PL code, and to hosted environments where you're not granted access to the server's file system. I think I would agree that the use case is not existing if the target is traditional in-house deployments where the sys admins are your colleagues. I've been told that's a smaller and smaller part of the database world though. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Warning in views.c
On Mon, Jan 16, 2012 at 3:47 PM, Magnus Hagander mag...@hagander.net wrote: Seem 1575fbcb caused this warning: view.c: In function ‘DefineVirtualRelation’: view.c:105:6: warning: variable ‘namespaceId’ set but not used [-Wunused-but-set-variable] Attached seems to be the easy fix - or am I missing something obvious? No, I think you nailed it. There is some more refactoring that should be done there, I think, but until it is, we don't need that return value for anything... Thanks for fixing it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] controlling the location of server-side SSL files
On Sat, Jan 14, 2012 at 8:40 AM, Peter Eisentraut pete...@gmx.net wrote: On mån, 2012-01-02 at 06:32 +0200, Peter Eisentraut wrote: I think I would like to have a set of GUC parameters to control the location of the server-side SSL files. Here is the patch for this. One thing that is perhaps worth thinking about: Currently, we just ignore missing root.crt and root.crl files. With this patch, we still do this, even if the user has given a specific nondefault location. That seems a bit odd, but I can't think of a simple way to do it better. There's a review in the CF app for this finding only minor issues, so I'm marking this patch therein as Ready for Committer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan and...@dunslane.net wrote: Second, what should be do when the database encoding isn't UTF8? I'm inclined to emit a \u escape for any non-ASCII character (assuming it has a unicode code point - are there any code points in the non-unicode encodings that don't have unicode equivalents?). The alternative would be to fail on non-ASCII characters, which might be ugly. Of course, anyone wanting to deal with JSON should be using UTF8 anyway, but we still have to deal with these things. What about SQL_ASCII? If there's a non-ASCII sequence there we really have no way of telling what it should be. There at least I think we should probably error out. I don't see any reason to escape anything more than the minimum required by the spec, which only requires it for control characters. If somebody's got a non-ASCII character in there, we can simply allow it to be represented by itself. That's almost certainly more compact (and very possibly more readable) than emitting \u for each such instance, and it also matches what the current EXPLAIN (FORMAT JSON) output does. In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Fri, Jan 20, 2012 at 12:30 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: The code badly needs comments. There is no explanation of how the trigram extraction code in trgm_regexp.c works. Sure. I hoped to find a time for comments before commitfest starts. Unfortunately I didn't, sorry. Guessing from the variable names, it seems to be some sort of a coloring algorithm that works on a graph, but that all needs to be explained. Can this algorithm be found somewhere in literature, perhaps? A link to a paper would be nice. I hope it's truly novel. At least application to regular expressions. I'm going to write a paper about it. Apart from that, the multibyte issue seems like the big one. Any way around that? Conversion of pg_wchar to multibyte character is the only way I found to avoid serious hacking of existing regexp code. Do you think additional function in pg_wchar_tbl which converts pg_wchar back to multibyte character is possible solution? -- With best regards, Alexander Korotkov.
Re: [HACKERS] Inline Extension
On Thu, Jan 19, 2012 at 3:42 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Tom Lane t...@sss.pgh.pa.us writes: I'm with Heikki on not believing that this is a good idea. If you are trying to do careful versioning of a set of object definitions, you want to stick the things in a file, you don't want them just flying by in submitted SQL. I'm trying to open the extension facilities (versions being the first of them, think \dx) to application PL code, and to hosted environments where you're not granted access to the server's file system. I guess the question is: for what purpose? As you recognized in your original email, if the extension is inline, then the objects will need to be dumped, because a simple CREATE EXTENSION command is bound to fail. But my understanding was that a major part of the reason - if not the entire reason - was to get pg_dump to emit CREATE EXTENSION bob instead of the component SQL commands. If we take that away, what's the remaining benefit of packaging those objects inside an extension instead of just dumping them loose into the database? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and relkind filtering
On Sat, Dec 31, 2011 at 07:41:00PM -0500, Noah Misch wrote: On Mon, Dec 05, 2011 at 05:06:37PM -0500, Bruce Momjian wrote: Pg_upgrade has the following check to make sure the cluster is safe for upgrading: res = executeQueryOrDie(conn, SELECT n.nspname, c.relname, a.attname FROM pg_catalog.pg_class c, pg_catalog.pg_namespace n, pg_catalog.pg_attribute a WHERE c.oid = a.attrelid AND NOT a.attisdropped AND a.atttypid IN ( 'pg_catalog.regproc'::pg_catalog.regtype, 'pg_catalog.regprocedure'::pg_catalog.regtype, 'pg_catalog.regoper'::pg_catalog.regtype, 'pg_catalog.regoperator'::pg_catalog.regtype, /* regclass.oid is preserved, so 'regclass' is OK */ /* regtype.oid is preserved, so 'regtype' is OK */ 'pg_catalog.regconfig'::pg_catalog.regtype, 'pg_catalog.regdictionary'::pg_catalog.regtype) AND c.relnamespace = n.oid AND n.nspname != 'pg_catalog' AND n.nspname != 'information_schema'); Based on a report from EnterpriseDB, I noticed that we check all pg_class entries, while there are cases where this is unnecessary because there is no data behind the entry, e.g. views. Here are the relkinds supported: #define RELKIND_RELATION'r' /* ordinary table */ #define RELKIND_INDEX 'i' /* secondary index */ #define RELKIND_SEQUENCE'S' /* sequence object */ #define RELKIND_TOASTVALUE 't' /* for out-of-line values */ #define RELKIND_VIEW'v' /* view */ #define RELKIND_COMPOSITE_TYPE 'c' /* composite type */ #define RELKIND_FOREIGN_TABLE 'f' /* foreign table */ #define RELKIND_UNCATALOGED 'u' /* not yet cataloged */ What types, other than views, can we skip in this query? RELKIND_UNCATALOGED should never appear on disk, and RELKIND_SEQUENCE and RELKIND_TOASTVALUE do not allow adding columns or changing column types. We might as well keep validating them. RELKIND_RELATION and RELKIND_INDEX have storage, so we must check those. The remaining three relkinds (RELKIND_VIEW, RELKIND_COMPOSITE_TYPE, RELKIND_FOREIGN_TABLE) have no storage, but all are usable as column types in other relations that do have storage. You could skip them iff they're unused that way, per a check like find_composite_type_dependencies(). Good point. I have applied the attached comment patch to document why we check all relkinds for regtypes. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c new file mode 100644 index 8594d26..891eb9a *** a/contrib/pg_upgrade/check.c --- b/contrib/pg_upgrade/check.c *** check_for_reg_data_type_usage(ClusterInf *** 644,649 --- 644,654 DbInfo *active_db = cluster-dbarr.dbs[dbnum]; PGconn *conn = connectToServer(cluster, active_db-db_name); + /* + * While several relkinds don't store any data, e.g. views, they + * can be used to define data types of other columns, so we + * check all relkinds. + */ res = executeQueryOrDie(conn, SELECT n.nspname, c.relname, a.attname FROM pg_catalog.pg_class c, -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
I also have a question about pg_wchar. /* *--- * encoding info table * XXX must be sorted by the same order as enum pg_enc (in mb/pg_wchar.h) *--- */ pg_wchar_tbl pg_wchar_table[] = { {pg_ascii2wchar_with_len, pg_ascii_mblen, pg_ascii_dsplen, pg_ascii_verifier, 1}, /* PG_SQL_ASCII */ {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JP */ {pg_euccn2wchar_with_len, pg_euccn_mblen, pg_euccn_dsplen, pg_euccn_verifier, 2}, /* PG_EUC_CN */ {pg_euckr2wchar_with_len, pg_euckr_mblen, pg_euckr_dsplen, pg_euckr_verifier, 3}, /* PG_EUC_KR */ {pg_euctw2wchar_with_len, pg_euctw_mblen, pg_euctw_dsplen, pg_euctw_verifier, 4}, /* PG_EUC_TW */ {pg_eucjp2wchar_with_len, pg_eucjp_mblen, pg_eucjp_dsplen, pg_eucjp_verifier, 3}, /* PG_EUC_JIS_2004 */ {pg_utf2wchar_with_len, pg_utf_mblen, pg_utf_dsplen, pg_utf8_verifier, 4}, /* PG_UTF8 */ {pg_mule2wchar_with_len, pg_mule_mblen, pg_mule_dsplen, pg_mule_verifier, 4}, /* PG_MULE_INTERNAL */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN1 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN2 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN3 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN4 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN5 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN6 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN7 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN8 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN9 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_LATIN10 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1256 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1258 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN866 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN874 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8R */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1251 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1252 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-5 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-6 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-7 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* ISO-8859-8 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1250 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1253 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1254 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1255 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_WIN1257 */ {pg_latin12wchar_with_len, pg_latin1_mblen, pg_latin1_dsplen, pg_latin1_verifier, 1}, /* PG_KOI8U */ {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2}, /* PG_SJIS */ {0, pg_big5_mblen, pg_big5_dsplen, pg_big5_verifier, 2}, /* PG_BIG5 */ {0, pg_gbk_mblen, pg_gbk_dsplen, pg_gbk_verifier, 2}, /* PG_GBK */ {0, pg_uhc_mblen, pg_uhc_dsplen, pg_uhc_verifier, 2}, /* PG_UHC */ {0, pg_gb18030_mblen, pg_gb18030_dsplen, pg_gb18030_verifier, 4}, /* PG_GB18030 */ {0, pg_johab_mblen, pg_johab_dsplen, pg_johab_verifier, 3}, /* PG_JOHAB */ {0, pg_sjis_mblen, pg_sjis_dsplen, pg_sjis_verifier, 2} /* PG_SHIFT_JIS_2004 */ }; What does last 7 zeros in the first column means? No conversion to pg_wchar is possible from these encodings? -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP -- renaming implicit sequences
On Sat, Jan 14, 2012 at 5:51 PM, Thomas Munro mu...@ip9.org wrote: On 12 January 2012 00:58, Tom Lane t...@sss.pgh.pa.us wrote: Hmm ... this seems a bit inconsistent with the fact that we got rid of automatic renaming of indexes a year or three back. Won't renaming of serials have all the same problems that caused us to give up on renaming indexes? I was sort of planning to do something similar for constraints (once the patch to support renaming constraints lands) and indexes (I didn't know they'd previously been automatically renamed and that had been dropped). Would you say that I should abandon this, no chance of being accepted? Is there a technical problem I'm missing, other than the gap between unique name generation and execution of the implicit ALTERs? Maybe I should look into writing a 'tidy rename' procedure for tables and columns instead, rather than modifying the behaviour of core ALTER TABLE. +1 for that approach. I think this is the kind of thing that seems cooler on paper than it is in real life. For example, consider this: rhaas=# create table foo (a serial); NOTICE: CREATE TABLE will create implicit sequence foo_a_seq for serial column foo.a CREATE TABLE rhaas=# alter sequence foo_a_seq rename to bob; ALTER SEQUENCE If somebody renames the table or the column at this point, it's a good bet that they *don't* want bob renamed. Also, some application code I've had in the past had sequence names hardcoded into it - it did things like SELECT nextval(...) followed by INSERT ..., for lack of INSERT .. RETURNING, which didn't exist at the time. So it's not implausible that renaming a sequence could be unwanted, though in practice the likelihood is fairly low: had I renamed a table, I probably would have renamed the sequence as well. Another, admittedly minor consideration is that this can introduce some subtle concurrency bugs that we don't have today. For example, suppose we choose a new name for the sequence which isn't in use, but then between the time when we pick the name and the time we do the insert the name becomes used, due to some concurrent transaction. Now we'll fail with a rather baffling error message. That isn't necessarily a huge problem - we have lots of code that is prone to such race conditions - but it's not wonderful either. Someday it would be nice to figure out a cleaner solution to these kinds of issues... maybe allowing the btree AM to return normally with an indication that there's a unique constraint violation, rather than throwing an ERROR. I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Fri, Jan 20, 2012 at 1:07 AM, Alexander Korotkov aekorot...@gmail.comwrote: What does last 7 zeros in the first column means? No conversion to pg_wchar is possible from these encodings? Uh, I see. These encodings is not supported as server encodings. -- With best regards, Alexander Korotkov.
Re: [HACKERS] JSON for PG 9.2
On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. And specifically, the \u syntax won't work, and there might be some non-Unicode characters in there. If that's not the behavior you want, then use UTF-8. It seems pretty clear that we're going to have to make some trade-off to handle non-UTF8 encodings, and I think what I'm suggesting is a lot less painful than disabling high-bit characters altogether. If we do that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his column label has a non-Unicode character in there? Should we say, oh, sorry, you can't explain that in JSON format? That is mighty unfriendly, and probably mighty complicated and expensive to figure out, too. We *do not support* mixing encodings in the same database, and if we make it the job of this patch to fix that problem, we're going to be in the same place for 9.2 that we have been for the last several releases: nowhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade with plpython is broken
On Thu, Dec 22, 2011 at 11:42:23AM -0500, Robert Haas wrote: On Mon, Dec 19, 2011 at 10:16 AM, Peter Eisentraut pete...@gmx.net wrote: Upgrading an instance containing plpython from =8.4 to =9.0 is broken because the module plpython.so was renamed to plpython2.so, and so the pg_upgrade check for loadable libraries fails thus: Your installation references loadable libraries that are missing from the new installation. etc. Installing a symlink fixes the issue. Should we teach pg_upgrade about this renaming, or should we install the symlink as part of the standard installation? I feel like this is a pg_upgrade bug, not so much a PL/python problem. I looked into this and the problem is coming from the checking of pg_proc library functions (not explicitly _language_ functions): plpython_call_handler| $libdir/plpython2 plpython_inline_handler | $libdir/plpython2 plpython_validator | $libdir/plpython2 All three of these entries relate to plpython, and obviously you can see the library name change. The list of required libraries is generated in the old cluster. One interesting solution would be to lookup the matching function names from the new cluster's pg_pltemplate, and use that library name. That would fix the problem of language library files being renamed, but not address non-language library file names being renamed --- there is no _template_ to look for these new values. I hate to add a complex fix for languages and leave the non-language cases unfixed. For that reason, I wonder if I should just hard-code the plpython rename into the pg_upgrade test in check_loadable_libraries(). -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On 1/18/12 4:18 PM, Jim Nasby wrote: What about doing away with all the arbitrary numbers completely, and just state data rate limits for hit/miss/dirty? Since many workloads will have a mix of all three, it still seems like there's some need for weighing these individually, even if they each got their own rates. If someone says read=8MB/s and write=4MB/s (the current effective defaults), I doubt they would be happy with seeing 12MB/s happen. BTW, this is a case where it would be damn handy to know if the miss was really a miss or not... in the case where we're already rate limiting vacuum, could we afford the cost of get_time_of_day() to see if a miss actually did have to come from disk? We certainly might if it's a system where timing information is reasonably cheap, and measuring that exact area will be easy if the timing test contrib module submitted into this CF gets committed. I could see using that to re-classify some misses as hits if the read returns fast enough. There's not an obvious way to draw that line though. The fast=hit vs. slow=miss transition happens at very different place on SSD vs. regular disks, as the simplest example. I don't see any way to wander down this path that doesn't end up introducing multiple new GUCs, which is the opposite of what I'd hoped to do--which was at worst to keep the same number, but reduce how many were likely to be touched. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On 1/19/12 1:10 PM, Robert Haas wrote: I have to say that I find that intensely counterintuitive. The current settings are not entirely easy to tune correctly, but at least they're easy to explain. I attempt to explain those settings to people in training classes about once a month. It's never been anything but a complete disaster. I am barely concerned about preserving the current UI because, as far as I've been able to tell, there are only a handful of PostgreSQL installatinos on the planet that have managed to use it happily. Even the ones that do have a non-default setup that works usually flailed about for some time until they get something that works, over a few frustrating months. And the result are settings few dare touch for fear of breaking it. It's also worth pointing out that VACUUM problems are very close to the top of the list of problems larger sites run into. So right now we have an inscrutable UI around an often essential part of the database to tune, one that any production site that gets over a few hundred GB of data in it will run into problems with. I wouldn't care about this area if it weren't for people screaming about how bad it is every time the topic comes up. If there's anyone out there who has run a larger PostgreSQL database and not at some point been extremely frustrated with how the current VACUUM settings are controlled, please speak up and say I'm wrong about this. I thought it was well understood the UI was near unusably bad, it just wasn't obvious what to do about it. What does that 8MB mean and how does it relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss, does that effectively also double the cost limit, so that dirty pages and hits become relatively cheaper? If so, then I think what that really means is that the limit is 8MB only if there are no hits and no dirtied pages - otherwise it's less, and the amount by which it is less is the result of some arcane calculation. Ugh! Saying what I suggested is an arcane calculation strikes me as pretty weird--we'd be hard pressed to design a more arcane calculation than the one that's already happening. The feedback here so far seems to lead toward making independent read and write knobs. I'm going to chew on the scenarios Robert described and the ones Jim has been commenting on and see if I can refactor this into something friendlier that addresses them. As for the suggestion that I'm bringing this up a bit late in the release cycle, I've been trying. My first submission pushing in this direction--improving the logging first, which is needed before you can usefully measure a behavior change--happened back in September. I've been moving this area as fast as I can get it to budge. I'm concerned now that much will be made of improved performance in 9.2, leading to people converting even larger systems than they used to. And it's not hard at all to find a large system where inability to tune vacuum easily is the top limiting factor on overall performance. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net wrote: On 01/19/2012 03:49 PM, Robert Haas wrote: In other words, let's decree that when the database encoding isn't UTF-8, *escaping* of non-ASCII characters doesn't work. But *unescaped* non-ASCII characters should still work just fine. The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. See http://www.ietf.org/rfc/rfc4627.txt?number=4627 section 3. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. And specifically, the \u syntax won't work, and there might be some non-Unicode characters in there. If that's not the behavior you want, then use UTF-8. It seems pretty clear that we're going to have to make some trade-off to handle non-UTF8 encodings, and I think what I'm suggesting is a lot less painful than disabling high-bit characters altogether. If we do that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his column label has a non-Unicode character in there? Should we say, oh, sorry, you can't explain that in JSON format? That is mighty unfriendly, and probably mighty complicated and expensive to figure out, too. We *do not support* mixing encodings in the same database, and if we make it the job of this patch to fix that problem, we're going to be in the same place for 9.2 that we have been for the last several releases: nowhere. OK, then we need to say that very clearly and up front (including in the EXPLAIN docs.) Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP -- renaming implicit sequences
Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012: I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. Maybe not remove it, but instead add a link to this discussion. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: index support for regexp search
On Thu, January 19, 2012 21:30, Heikki Linnakangas wrote: On 22.11.2011 21:38, Alexander Korotkov wrote: WIP patch with index support for regexp search for pg_trgm contrib is attached. In spite of techniques which extracts continuous text parts from regexp, this patch presents technique of automatum transformation. That allows more comprehensive trigrams extraction. Nice! Yes, wonderful stuff; I tested quite a bit with this patch; FWIW, here's what I found. The patch yields spectacular speedups with small, simple-enough regexen. But it does not do a good enough job when guessing where to use the index and where fall back to Seq Scan. This can lead to (also spectacular) slow-downs, compared to Seq Scan. I used the following to generate 3 test tables with lines of 80 random chars (just in case it's handy for others to use): $ cat create_data.sh #!/bin/sh for power in 4 5 6 do table=azjunk${power} index=${table}_trgmrgx_txt_01_idx echo -- generating table $table with index $index; time perl -E' sub ss{ join,@_[ map{rand @_} 1 .. shift ] }; say(ss(80,a..g, ,h..m, ,n..s, ,t..z)) for 1 .. 1e'${power}; \ | psql -aqXc drop table if exists $table; create table $table(txt text); copy $table from stdin; -- set session maintenance_work_mem='20GB'; create index $index on $table using gin(txt gin_trgm_ops); analyze $table;; done \dt+ public.azjunk* List of relations Schema | Name | Type | Owner | Size | Description +-+---+---+-+- public | azjunk4 | table | breinbaas | 1152 kB | public | azjunk5 | table | breinbaas | 11 MB | public | azjunk6 | table | breinbaas | 112 MB | (3 rows) I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in your patch), is that true? I can understand you want that value to be low to limit the above risk, but now it reduces the usability of the feature a bit: one has to split up larger char-classes into several smaller ones to make a statement use the index: i.e.: txt ~ 'f[aeio]n' OR txt ~ 'f[uy]n' instead of txt ~ 'f[aeiouy]n' I made compiled instances with larger values for MAX_COLOR_CHARS (6 and 9), and sure enough they used the index for larger classes such as the above, but of course also got into problems easier when quantifiers are added (*, +, {n,m}). A better calculation to decide index-use seems necessary, and ideally one that allows for a larger MAX_COLOR_CHARS than 4. Especially quantifiers could perhaps be inspected wrt that decision. IMHO, the functionality would still be very useful when only very simple regexen were considered. Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I suppose this is normally necessary for perfomance reasons, but it would be useful te be able to compile a test version that allows it. I don't know how hard that would be. There is also a minor bug, I think, when running with 'set enable_seqscan=off' in combination with a too-large regex: $ cat fail.sql: set enable_bitmapscan=on; set enable_seqscan=on; explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n'; -- OK explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n'; -- OK set enable_bitmapscan=on; set enable_seqscan=off; explain analyze select txt from azjunk4 where txt ~ 'f[aeio]n'; -- OK explain analyze select txt from azjunk4 where txt ~ 'f[aeiou]n'; -- crashes $ psql -f fail.sql QUERY PLAN -- Bitmap Heap Scan on azjunk4 (cost=52.01..56.02 rows=1 width=81) (actual time=1.011..5.291 rows=131 loops=1) Recheck Cond: (txt ~ 'f[aeio]n'::text) - Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx (cost=0.00..52.01 rows=1 width=0) (actual time=0.880..0.880 rows=131 loops=1) Index Cond: (txt ~ 'f[aeio]n'::text) Total runtime: 5.700 ms (5 rows) QUERY PLAN --- Seq Scan on azjunk4 (cost=0.00..268.00 rows=1 width=81) (actual time=1.491..36.049 rows=164 loops=1) Filter: (txt ~ 'f[aeiou]n'::text) Rows Removed by Filter: 9836 Total runtime: 36.112 ms (4 rows) QUERY PLAN -- Bitmap Heap Scan on azjunk4 (cost=52.01..56.02 rows=1 width=81) (actual time=0.346..0.927 rows=131 loops=1) Recheck Cond: (txt ~ 'f[aeio]n'::text) - Bitmap Index Scan on azjunk4_trgmrgx_txt_01_idx (cost=0.00..52.01 rows=1 width=0) (actual time=0.316..0.316 rows=131 loops=1) Index Cond: (txt ~ 'f[aeio]n'::text)
Re: [HACKERS] WIP -- renaming implicit sequences
On Thu, Jan 19, 2012 at 6:30 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012: I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. Maybe not remove it, but instead add a link to this discussion. I don't see what that accomplishes, unless someone's arguing that we really do want this behavior...? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP -- renaming implicit sequences
Excerpts from Robert Haas's message of jue ene 19 20:53:42 -0300 2012: On Thu, Jan 19, 2012 at 6:30 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of jue ene 19 18:07:33 -0300 2012: I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. Maybe not remove it, but instead add a link to this discussion. I don't see what that accomplishes, unless someone's arguing that we really do want this behavior...? Well, it documents what happened when we tried to do it. Sort of like features we do not want. But then, maybe it's just TODO bloat. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Thu, Jan 19, 2012 at 5:59 PM, Andrew Dunstan and...@dunslane.net wrote: OK, then we need to say that very clearly and up front (including in the EXPLAIN docs.) Can do. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Well, that would be nice, but I don't think it's practical. It will certainly be the case, under the scheme I'm proposing, or probably any other sensible scheme also, that if a client whose encoding is UTF-8 gets a value of type json back fro the database, it's strictly valid JSON. But it won't be possible to store every legal JSON value in the database if the database encoding is anything other than UTF-8, even if the client encoding is UTF-8. The backend will get the client's UTF-8 bytes and transcode them to the server encoding before calling the type-input function, so if there are characters in there that can't be represented in UTF-8, then we'll error out before the JSON data type ever gets control. In theory, it would be possible to accept such strings if the client chooses to represent them using a \u sequence, but I'm unexcited about doing the work required to make that happen, because it will still be a pretty half-baked: we'll be able to accept some representations of the same JSON constant but not others. I think the real fix for this problem is to introduce an infrastructure inside the database that allows us to have different columns stored in different encodings. People use bytea for that right now, but that's pretty unfriendly: it would be nice to have a better system. However, I expect that to take a lot of work and break a lot of things, and until we do it I don't feel that compelled to provide buggy and incomplete support for it under the guise of implementing a JSON datatype. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Jan 19, 2012, at 4:27 PM, Robert Haas wrote: I think the real fix for this problem is to introduce an infrastructure inside the database that allows us to have different columns stored in different encodings. People use bytea for that right now, but that's pretty unfriendly: it would be nice to have a better system. However, I expect that to take a lot of work and break a lot of things, and until we do it I don't feel that compelled to provide buggy and incomplete support for it under the guise of implementing a JSON datatype. +1 This seems like a reasonable compromise and course of action, especially if someone is interested in taking on column-level encodings at some point in the next year or two. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP -- renaming implicit sequences
On Thu, Jan 19, 2012 at 7:15 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: I think we should remove this from the TODO list, or at least document that there are a number of reasons why it might be a deeper hole than it appears to be at first glance. Maybe not remove it, but instead add a link to this discussion. I don't see what that accomplishes, unless someone's arguing that we really do want this behavior...? Well, it documents what happened when we tried to do it. Sort of like features we do not want. But then, maybe it's just TODO bloat. I share your urge to memorialize the conversation somewhere, but I fear it will just cause future people to spend time thinking about it that isn't really warranted. I guess we could move it to the features we do not want section, but that's mostly bigger picture stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
On Thu, Jan 19, 2012 at 5:39 PM, Greg Smith g...@2ndquadrant.com wrote: On 1/19/12 1:10 PM, Robert Haas wrote: I have to say that I find that intensely counterintuitive. The current settings are not entirely easy to tune correctly, but at least they're easy to explain. I attempt to explain those settings to people in training classes about once a month. It's never been anything but a complete disaster. I am barely concerned about preserving the current UI because, as far as I've been able to tell, there are only a handful of PostgreSQL installatinos on the planet that have managed to use it happily. Even the ones that do have a non-default setup that works usually flailed about for some time until they get something that works, over a few frustrating months. And the result are settings few dare touch for fear of breaking it. It's also worth pointing out that VACUUM problems are very close to the top of the list of problems larger sites run into. So right now we have an inscrutable UI around an often essential part of the database to tune, one that any production site that gets over a few hundred GB of data in it will run into problems with. I wouldn't care about this area if it weren't for people screaming about how bad it is every time the topic comes up. If there's anyone out there who has run a larger PostgreSQL database and not at some point been extremely frustrated with how the current VACUUM settings are controlled, please speak up and say I'm wrong about this. I thought it was well understood the UI was near unusably bad, it just wasn't obvious what to do about it. What does that 8MB mean and how does it relate to vacuum_cost_page_miss? If I double vacuum_rate_page_miss, does that effectively also double the cost limit, so that dirty pages and hits become relatively cheaper? If so, then I think what that really means is that the limit is 8MB only if there are no hits and no dirtied pages - otherwise it's less, and the amount by which it is less is the result of some arcane calculation. Ugh! Saying what I suggested is an arcane calculation strikes me as pretty weird--we'd be hard pressed to design a more arcane calculation than the one that's already happening. Perhaps so, but I'm willing to bet that if we have a variable that looks like a pure read limit or a pure dirty limit and really is not, we'll have succeeded. :-) The feedback here so far seems to lead toward making independent read and write knobs. I'm going to chew on the scenarios Robert described and the ones Jim has been commenting on and see if I can refactor this into something friendlier that addresses them. As for the suggestion that I'm bringing this up a bit late in the release cycle, I've been trying. My first submission pushing in this direction--improving the logging first, which is needed before you can usefully measure a behavior change--happened back in September. I've been moving this area as fast as I can get it to budge. I'm concerned now that much will be made of improved performance in 9.2, leading to people converting even larger systems than they used to. And it's not hard at all to find a large system where inability to tune vacuum easily is the top limiting factor on overall performance. I certainly didn't intend to come across as disparaging your work on this topic. I understand that there are big problems with the way things work now; I'm just cautious about trying to replace them too hastily with something that may not turn out to be any better. Of course, if we can replace it with something that we're sure is actually an improvement, I'm all in favor of that. But, IMHO, the problems in this area are too serious to be solved by renaming the knobs. At most, we're going to buy ourselves a little time to come up with a better solution. IMHO, and at the risk of repeating myself, one of the big problems in this area is that we're making the user guess something that we really ought to be figuring out for them. Just as users want checkpoints to run as slowly as possible while still not bumping into the next checkpoint, they'd presumably like vacuum to run as slowly as possible without bumping into the next vacuum. Instead, we make them tell us how fast they'd like it tor run, which requires them to guess a value high enough to finish soon enough but low enough to minimize the impact on the rest of the system. Another problem is that the vacuum algorithm itself could, I think, be made much smarter. We could teach HOT to prune pages that contain no HOT chains but do contain dead tuples. That would leave dead line pointers behind, but that's not nearly as bad as leaving the entire tuple behind. We could, as Simon and others have suggested, have one threshold for vacuuming the heap (i.e. reclaiming dead tuples) and another for vacuuming the indexes (i.e. reclaiming dead line pointers). That would open the door to partial vacuuming: just
Re: [HACKERS] Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote: On Thu, Jan 19, 2012 at 10:37 AM, Noah Misch n...@leadboat.com wrote: I agree with Merlin; the frontend/backend protocol is logically distinct from the binary send/recv formats of data types. ?For one key point, the latter is not exclusively core-defined; third-party extensions change their send/recv formats on a different schedule. ?They can add myext.binary_format_version GUCs of their own to cope in a similar way. I agree. It occurs to me that we recently changed the default *text* output format for bytea for reasons not dissimilar to those contemplated here. Presumably, that's a much more disruptive change, and yet we've had minimal complaints because anyone who gets bitten can easily set bytea_output='escape' and the problem goes away. The same thing seems like it would work here, only the number of people needing to change the parameter will probably be even smaller, because fewer people use binary than text. Having said that, if we're to follow the precedent set by bytea_format, maybe we ought to just add binary_array_format={huge,ittybitty} and be done with it, rather than inventing a minor protocol version GUC for something that isn't really a protocol version change at all. We could introduce a differently-named general mechanism, but I guess I'm not seeing the point of that either. Just because someone has a backward-compatibility issue with one change of this type doesn't mean they have a similar issue with all of them. So I think adding a special-purpose GUC is more logical and more parallel to what we've done in the past, and it doesn't saddle us with having to be certain that we've designed the mechanism generally enough to handle all the cases that may come later. That makes sense. An attraction of a single binary format version was avoiding the Is this worth a GUC? conversation for each change. However, adding a GUC should be no more notable than bumping a binary format version. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Wed, Jan 18, 2012 at 4:23 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Alvaro Herrera alvhe...@commandprompt.com writes: Huh, isn't it simpler to just pass the triggers the parse tree *after* parse analysis? I don't understand what you're doing here. I didn't realize that the parse analysis is in fact done from within standard_ProcessUtility() directly, which means your suggestion is indeed workable. Tom Lane t...@sss.pgh.pa.us writes: It's not the costs I'm worried about so much as the side effects --- Ok, so I'm now calling the command trigger procedures once the parse analysis is done, and guess what, I'm back to the same problem as before: https://github.com/dimitri/postgres/commit/4bfab6344a554c09f7322e861f9d09468f641bd9 CREATE TABLE public.ab_foo-bar ( id serial NOT NULL, foo integer default 1, PRIMARY KEY(id) ); NOTICE: CREATE TABLE will create implicit sequence ab_foo-bar_id_seq for serial column ab_foo-bar.id NOTICE: snitch: CREATE SEQUENCE ERROR: unrecognized node type: 904 I'm not sure about the next step, and I'm quite sure I need to stop here for tonight. Any advice welcome, I'll be working on that again as soon as tomorrow. My advice is to forget about trying to provide the command string to the user for the first version of this patch. As you're finding out, there's no simple, easy, obvious way of doing it, and there are N0 useful things that can be done without that functionality. I continue to think that for a first version of this, we should be satisfied to pass just the OID. I know that's not really what you want, but there's much to be said for picking a goal that is achievable in the limited time available, and I fear that setting your sights too high will lead to something that either (a) doesn't get committed, or (b) gets committed, but turns out not to work very well, either of which would be less than ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of patch renaming constraints
Make check passed. Patch has tests for rename constraint. Most normal uses of alter table ... rename constraint ... worked normally. However, the patch does not deal correctly with constraints which are not inherited, such as primary key constraints: That appears to be because creating a primary key constraint does not set pg_constraint.conisonly correctly. This was introduced recently with noninherited check constraints. Umm, conisonly is set as false from primary key entries in pg_constraint. And primary keys are anyways not inherited. So why is the conisonly field interfering in rename? Seems quite orthogonal to me. Regards, Nikhils
Re: [HACKERS] WAL Restore process during recovery
On Fri, Jan 20, 2012 at 4:17 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Jan 17, 2012 at 6:52 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Jan 16, 2012 at 2:06 AM, Simon Riggs si...@2ndquadrant.com wrote: WALRestore process asynchronously executes restore_command while recovery continues working. Overlaps downloading of next WAL file to reduce time delays in file based archive recovery. Handles cases of file-only and streaming/file correctly. Though I've not reviewed the patch deeply yet, I observed the following two problems when I tested the patch. When I set up streaming replication + archive (i.e., restore_command is set) and started the standby, I got the following error: FATAL: all AuxiliaryProcs are in use LOG: walrestore process (PID 18839) exited with exit code 1 Fixed and better documented. When I started an archive recovery without setting restore_command, it successfully finished. Not sure exactly what you mean, but I fixed a bug that might be something you're seeing. Thanks! But you forgot to include walrestore.c and .h in the patch. Can you submit the updated version of the patch? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group commit, revised
On 19 January 2012 17:40, Robert Haas robertmh...@gmail.com wrote: I don't know what you mean by this. I think removing wal_writer_delay is premature, because I think it still may have some utility, and the patch removes it. That's a separate change that should be factored out of this patch and discussed separately. FWIW, I don't really care too much if we keep wal_writer_delay, provided it is only used in place of the patch's WALWRITER_NORMAL_TIMEOUT constant. I will note in passing that I doubt that the effect with asynchronous commits and hint bits is pronounced enough to have ever transferred through to someone making a practical recommendation to reduce wal_writer_delay to ameliorate clog contention. When we did sync rep it made sense to have the WALSender do the work and for others to just wait. It would be quite strange to require a different design for essentially the same thing for normal commits and WAL flushes to local disk. I should mention the original proposal for streaming replication had each backend send data to standby independently and that was recognised as a bad idea after some time. Same for sync rep also. I don't think those cases are directly comparable. SR is talking to another machine, and I can't imagine that there is a terribly convenient or portable way for every backend that needs one to get a hold of the file descriptor for the socket. Even if it could, the data is sent as a stream, so if multiple backends sent to the same file descriptor you'd have to have some locking to prevent messages from getting interleaved. Or else you could have multiple connections, or use UDP, but that gets rather complex. Anyway, none of this is an issue for file I/O: anybody can open the file, and if two backends write data at different offsets at the same time - or the same data at the same offset at the same time - there's no problem. So the fact that it wasn't a good idea for SR doesn't convince me that it's also a bad idea here. On the other hand, I'm not saying we *should* do it that way, either - i.e. I am not trying to require a different design just because it's fun to make people change things. Rather, I am trying to figure out whether the design you've chosen is in fact the best one, and part of that involves reasoning about why it might or might not be. There are obvious reasons to think that having process A kick process B and go to sleep, then have process B do some work and wake up process A might be less efficient than having process A just do the work itself, in the uncontended case. The fact that that isn't happening seems awfully strange to me; it's hard to understand why 3 system calls are faster than one. That suggests that either the current system is badly broken in some way that this patch fixes (in which case, it would be nice to know what the broken thing is) or that there's an opportunity for further optimization of the new patch (either now or later, depending on how much work we're talking about). Just to be clear, it makes perfect sense that the new system is faster in the contended case, and the benchmark numbers are very impressive. It's a lot harder to understand why it's not slower in the uncontended case. Not sure why its odd to have backends do one thing and auxiliaries do another. The whole point of auxiliary processes is that they do a specific thing different to normal backends. Anyway, the important thing is to have auxiliary processes be independent of each other as much as possible, which simplifies error handling and state logic in the postmaster. Yeah, I guess the shutdown sequence could get a bit complex if we try to make everyone go through the WAL writer all the time. But I wonder if we could rejiggering things somehow so that everything goes through WAL writer if its dead. I'm not sure what you mean by this last bit. It sounds a bit hazardous. My problem with nominating a backend to the status of an auxiliary is that no matter what way you cut it, it increases the failure surface area, so to speak. I'm not sure why Heikki thinks that it follows that having a dependency on some backend is simpler than having one on an auxiliary process. As to the question of IPC and context switch overhead, I'd speculate that protecting access to a data structure with book keeping information regarding which backend is currently the driver and so on might imply considerably more overhead than IPC and context switching. It might also be that having WAL Writer almost solely responsible for this might facilitate more effective use of CPU cache. On most modern architectures, system calls don't actually cause a full context switch. The kernel can just enter a mode switch (go from user mode to kernel mode, and then back to user mode). You can observe this effect with vmstat. That's how 3 system calls might not look much more expensive than 1. + if (delta XLOG_SEG_SIZE * CheckPointSegments
Re: [HACKERS] Online base backup from the hot-standby
On 12-01-17 05:38 AM, Fujii Masao wrote: On Fri, Jan 13, 2012 at 5:02 PM, Fujii Masaomasao.fu...@gmail.com wrote: The amount of code changes to allow pg_basebackup to make a backup from the standby seems to be small. So I ended up merging that changes and the infrastructure patch. WIP patch attached. But I'd happy to split the patch again if you want. Attached is the updated version of the patch. I wrote the limitations of standby-only backup in the document and changed the error messages. Here is my review of this verison of the patch. I think this patch has been in every CF for 9.2 and I feel it is getting close to being committed. The only issue of significants is a crash I encountered while testing, see below. I am fine with including the pg_basebackup changes in the patch it also makes testing some of the other changes possible. The documentation updates you have are good I don't see any issues looking at the code. Testing Review I encountered this on my first replica (the one based on the master). I am not sure if it is related to this patch, it happened after the pg_basebackup against the replica finished. TRAP: FailedAssertion(!(((xid) != ((TransactionId) 0))), File: twophase.c, Line: 1238) LOG: startup process (PID 1) was terminated by signal 6: Aborted LOG: terminating any other active server processes A little earlier this postmaster had printed. LOG: restored log file 0001001F from archive LOG: restored log file 00010020 from archive cp: cannot stat `/usr/local/pgsql92git/archive/00010021': No such file or directory LOG: unexpected pageaddr 0/1900 in log file 0, segment 33, offset 0 cp: cannot stat `/usr/local/pgsql92git/archive/00010021': No such file or directory I have NOT been able to replicate this error and I am not sure exactly what I had done in my testing prior to that point. In another test run I had - set full page writes=off and did a checkpoint - Started the pg_basebackup - set full_page_writes=on and did a HUP + some database activity that might have forced a checkpoint. I got this message from pg_basebackup. ./pg_basebackup -D ../data3 -l foo -h localhost -p 5438 pg_basebackup: could not get WAL end position from server I point this out because the message is different than the normal could not initiate base backup: FATAL: WAL generated with full_page_writes=off thatI normally see.We might want to add a PQerrorMessage(conn)) to pg_basebackup to print the error details. Since this patch didn't actually change pg_basebackup I don't think your required to improve the error messages in it. I am just mentioning this because it came up in testing. The rest of the tests I did involving changing full_page_writes with/without checkpoints and sighups and promoting the replica seemed to work as expected. Regards,
Re: [HACKERS] triggered_change_notification v3
On Sun, Jan 15, 2012 at 11:05 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Attached is a version of a previously posted patch which has been modified based on on-list feedback from Álvaro. This is a generalized trigger function which can be used as an AFTER EACH ROW trigger on any table which has a primary key, and will send notifications of operations for which the trigger is attached, with a payload indicating the table, the operation, and the primary key. A channel can be specified in the CREATE TRIGGER command, but will default to tcn if omitted. Nice work, Kevin. I've committed this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum rate limit in KBps
I chewed a bit on Heikki's comment that similarity to the query planning parameters might be useful, and Robert's that being able to explain how the feature works more easily has value. I have an initial adjustment of my general idea that I think moves usefully in both those directions. The existing VACUUM cost constants look like this: vacuum_cost_page_hit = 1 vacuum_cost_page_miss = 10 vacuum_cost_page_dirty = 20 These could be adjusted to instead be ratios like the query planner ones (seq_page_cost, random_page_cost, etc.), referenced off a value of 1.0 for page miss ~= a read is expected: vacuum_cost_page_hit = 0.1 vacuum_cost_page_miss = 1.0 vacuum_cost_page_dirty = 2.0 Now add in the new setting, which is explicitly said to be the read value: vacuum_cost_read_limit = 8000 # maximum page miss read rate in kilobytes/second And I can shuffle the numbers around internally such that things still work exactly the same, at the default parameters. And then anyone who spends time learning how either the query planning or vacuum cost ratio constants work will find the learning curve to pick up the other set easier. An interesting fall-out of this refactoring is that old postgresql.conf settings moved forward for *all* these values will still work fine. The ratios are right and the internal computation won't care. The math is just more complicated to explain when vacuum_cost_page_miss is anything but 1.0, which is a problem the manual doesn't have to address. We don't worry about making every query planner parameter discussion consider what happens if someone moves seq_page_cost around, this will put vacuum_cost_page_miss into the same reference constant category. The only problem is for someone who changed one but not all of them in their old configuration; that's going to give an unexpected result. It might be a bit more straightforward yet if things were renamed so it was more obvious that page miss~=read, but I haven't seen a good way to do that yet. Renaming the reference cost value to vacuum_cost_page_read has two problems. It makes the backward compatibility issues larger, and it's not quite true. The way I think this should be explained, they really aren't the same; that's why I used ~= above. A page miss is not guaranteed to be a read, it just is expected to be one in the worst case. The read rate that vacuum page misses introduce will not be exactly the same as vacuum_cost_read_limit--but it will be below that limit, which is all it claims to be. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inline Extension
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 19, 2012 at 3:42 PM, Dimitri Fontaine I'm trying to open the extension facilities (versions being the first of them, think \dx) to application PL code, and to hosted environments where you're not granted access to the server's file system. I guess the question is: for what purpose? As you recognized in your original email, if the extension is inline, then the objects will need to be dumped, because a simple CREATE EXTENSION command is bound to fail. But my understanding was that a major part of the reason - if not the entire reason - was to get pg_dump to emit CREATE EXTENSION bob instead of the component SQL commands. If we take that away, what's the remaining benefit of packaging those objects inside an extension instead of just dumping them loose into the database? Indeed, it seems like such a thing is not an extension at all anymore, or at least it gives up many of the useful properties of extensions. Given the entire lack of demand from the field for such a cut-down concept of extension, I think we should not be in a hurry to introduce it. Maybe in a year or two when we have a clearer idea of how people are actually using extensions, there will be a better argument for it. Right now I'm afraid that we might foreclose our options for other future extension features because these things would be incompatible with such ideas. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter
On 01/19/2012 10:52 AM, Robert Haas wrote: It's not quite clear from your email, but I gather that the way that this is intended to work is that these values increment every time we checkpoint? Right--they get updated in the same atomic bump that moves up things like buffers_checkpoint Also, forgive for asking this possibly-stupid question, but of what use is this information? I can't imagine why I'd care about a running total of the number of files fsync'd to disk. I also can't really imagine why I'd care about the length of the write phase, which surely will almost always be a function of checkpoint_completion_target and checkpoint_timeout unless I manage to overrun the number of checkpoint_segments I've allocated. The only number that really seems useful to me is the time spent syncing. I have a clear idea what to look for there: smaller numbers are better than bigger ones. For the rest I'm mystified. Priority #1 here is to reduce (but, admittedly, not always eliminate) the need for log file parsing of this particular area, so including all the major bits from the existing log message that can be published this way would include the write phase time. You mentioned one reason why the write phase time might be interesting; there could be others. One of the things expected here is that Munin will expand its graphing of values from pg_stat_bgwriter to include all these fields. Most of the time the graph of time spent in the write phase will be boring and useless. Making it easy for a look at a graph to spot those rare times when it isn't is one motivation for including it. As for why to include the number of files being sync'd, one reason is again simply wanting to include everything that can easily be published. A second is that it helps support ideas like my Checkpoint sync pause one; that's untunable in any reasonable way without some easy way of monitoring the number of files typically sync'd. Sometimes when I'm investigating checkpoint spikes during sync, I wonder whether they were because more files than usual were synced, or if it's instead just because of more churn on a smaller number. Making this easy to graph pulls that data out to where I can compare it with disk I/O trends. And there's precedent now proving that an always incrementing number in pg_stat_bgwriter can be turned into such a graph easily by monitoring tools. And, it doesn't seem like it's necessarily going to safe me a whole lot either, because if it turns out that my sync phases are long, the first question out of my mouth is going to be what percentage of my total sync time is accounted for by the longest sync?. And so right there I'm back to the logs. It's not clear how such information could be usefully exposed in pg_stat_bgwriter either, since you probably want to know only the last few values, not a total over all time. This isn't ideal yet. I mentioned how some future performance event logging history collector was really needed as a place to push longest sync times into, and we don't have it yet. This is the best thing to instrument that I'm sure is useful, and that I can stick onto with the existing infrastructure. The idea is that this change makes it possible to trigger a sync times are too long alert out of a tool that's based solely on database queries. When that goes off, yes you're possibly back to the logs again for more details about the longest individual sync time. But the rest of the time, what's hopefully the normal state of things, you can ignore the logs and just track the pg_stat_bgwriter numbers. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade with plpython is broken
On tor, 2012-01-19 at 17:04 -0500, Bruce Momjian wrote: For that reason, I wonder if I should just hard-code the plpython rename into the pg_upgrade test in check_loadable_libraries(). Yes, I haven't come up with a better solution either. If this becomes a general problem, we might need to add a command line option to ignore certain names or something. But not yet. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Andrew Dunstan and...@dunslane.net writes: On 01/19/2012 04:12 PM, Robert Haas wrote: On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstanand...@dunslane.net wrote: The spec only allows unescaped Unicode chars (and for our purposes that means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will result in something that's not legal JSON. I understand. I'm proposing that we not care. In other words, if the server encoding is UTF-8, it'll really be JSON. But if the server encoding is something else, it'll be almost-JSON. Of course, for data going to the client, if the client encoding is UTF8, they should get legal JSON, regardless of what the database encoding is, and conversely too, no? Yes. I think this argument has been mostly theologizing, along the lines of how many JSON characters can dance on the head of a pin. From a user's perspective, the database encoding is only a constraint on which characters he can store. He does not know or care what the bit representation is inside the server. As such, if we store a non-ASCII character in a JSON string, it's valid JSON as far as the user is concerned, so long as that character exists in the Unicode standard. If his client encoding is UTF8, the value will be letter-perfect JSON when it gets to him; and if his client encoding is not UTF8, then he's already pretty much decided that he doesn't give a fig about the Unicode-centricity of the JSON spec, no? So I'm with Robert: we should just plain not care. I would further suggest that maybe what we should do with incoming JSON escape sequences is convert them to Unicode code points and then to the equivalent character in the database encoding (or throw error if there is none). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of patch renaming constraints
On fre, 2012-01-20 at 09:08 +0530, Nikhil Sontakke wrote: Umm, conisonly is set as false from primary key entries in pg_constraint. And primary keys are anyways not inherited. So why is the conisonly field interfering in rename? Seems quite orthogonal to me. In the past, each kind of constraint was either always inherited or always not, implicitly. Now, for check constraints we can choose what we want, and in the future, perhaps we will want to choose for primary keys as well. So having conisonly is really a good step into that future, and we should use it uniformly. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of patch renaming constraints
And primary keys are anyways not inherited. So why is the conisonly field interfering in rename? Seems quite orthogonal to me. In the past, each kind of constraint was either always inherited or always not, implicitly. Now, for check constraints we can choose what we want, and in the future, perhaps we will want to choose for primary keys as well. So having conisonly is really a good step into that future, and we should use it uniformly. Agreed. And right now primary key constraints are not marked as only making them available for inheritance in the future. Or you prefer it otherwise? Anyways, fail to see the direct connection between this and renaming. Might have to look at this patch for that. Regards, Nikhils
Re: [HACKERS] Patch review for logging hooks (CF 2012-01)
On 01/18/2012 04:23 PM, Marti Raudsepp wrote: The updated patch looks good, marking as 'Ready for Committer' Patches without documentation are never ready for commit. For this one, I'm not sure if that should be in the form of a reference example in contrib, or just something that documents that the hook exists and what the ground rules are for grabbing it. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers