Re: [HACKERS] logical changeset generation v3
On Wed, Nov 21, 2012 at 4:34 PM, Andres Freund and...@2ndquadrant.comwrote: On 2012-11-21 14:57:08 +0900, Michael Paquier wrote: Ah, I see. Could you try the following diff? diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index df24b33..797a126 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder, Snapstate *snapstate, */ snapstate-transactions_after = buf-origptr; + snapstate-nrrunning = running-xcnt; snapstate-xmin_running = InvalidTransactionId; snapstate-xmax_running = InvalidTransactionId; I am still getting the same assertion failure even with this diff included. -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] FDW for PostgreSQL
2012/11/21 Shigeru Hanada shigeru.han...@gmail.com: Thank for the comment! On Tue, Nov 20, 2012 at 10:23 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: I also think the new use_remote_explain option is good. It works fine when we try to use this fdw over the network with latency more or less. It seems to me its default is false, thus, GetForeignRelSize will return the estimated cost according to ANALYZE, instead of remote EXPLAIN. Even though you mention the default behavior was not changed, is it an expected one? My preference is the current one, as is. Oops, I must have focused on only cost factors. I too think that using local statistics is better as default behavior, because foreign tables would be used for relatively stable tables. If target tables are updated often, it would cause problems about consistency, unless we provide full-fledged transaction mapping. The deparseFuncExpr() still has case handling whether it is explicit case, implicit cast or regular functions. If my previous proposition has no flaw, could you fix up it using regular function invocation manner? In case when remote node has incompatible implicit-cast definition, this logic can make a problem. Sorry, I overlooked this issue. Fixed to use function call notation for all of explicit function calls, explicit casts, and implicit casts. At InitPostgresFdwOptions(), the source comment says we don't use malloc() here for simplification of code. Hmm. I'm not sure why it is more simple. It seems to me we have no reason why to avoid malloc here, even though libpq options are acquired using malloc(). I used simple because using palloc avoids null-check and error handling. However, many backend code use malloc to allocate memory which lives as long as backend process itself, so I fixed. Regarding to the regression test. [snip] I guess this test tries to check a case when remote column has incompatible data type with local side. Please check timestamp_out(). Its output format follows datestyle setting of GUC, and it affects OS configuration on initdb time. (Please grep datestyle at initdb.c !) I'd like to recommend to use another data type for this regression test to avoid false-positive detection. Good catch. :) I fixed the test to use another data type, user defined enum. One other thing I noticed. At execute_query(), it stores the retrieved rows onto tuplestore of festate-tuples at once. Doesn't it make problems when remote- table has very big number of rows? IIRC, the previous code used cursor feature to fetch a set of rows to avoid over-consumption of local memory. Do we have some restriction if we fetch a certain number of rows with FETCH? It seems to me, we can fetch 1000 rows for example, and tentatively store them onto the tuplestore within one PG_TRY() block (so, no need to worry about PQclear() timing), then we can fetch remote rows again when IterateForeignScan reached end of tuplestore. Please point out anything if I missed something. Anyway, I'll check this v4 patch simultaneously. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
2012-11-20 20:32 keltezéssel, Boszormenyi Zoltan írta: 2012-11-20 17:03 keltezéssel, Boszormenyi Zoltan írta: 2012-11-18 17:20 keltezéssel, Magnus Hagander írta: Much of the tar stuff is very similar (I haven't looked to see if it's identical) to the stuff in backend/replication/basebackup.c. Perhaps we should have a src/port/tarutil.c? I will implement it as a separate patch. I implemented it, all 3 patches are attached now. Use this patchset, the previously sent 1st patch had a bug, it called conninit_storeval() with value == NULL arguments and it crashes on strdup(NULL). escape_string() - already exists as escape_quotes() in initdb, AFAICT. We should either move it to src/port/, or at least copy it so it's exactly the same. A copy of escape_quotes() is now in pg_basebackup.c and is used. I will also unify the copies of it in a separate patch. This one is not done yet, but a suggestion on which existing file it can fit into or for a new src/port/filename is welcome. I experimented with unifying escape_quotes() shortly. The problem is that it calls pg_malloc() which is an executable-specific function. Some of the bin/* executables define it as calling exit(1) when malloc() returns NULL, some call it with exit(EXIT_FAILURE) which happens to be 1 but still can be different from the constant 1. Some executables only define pg_malloc0() but not pg_malloc(). pg_basebackup needs pg_malloc() to call disconnect_and_exit(1) instead to quit cleanly and not leave an unexpected EOF from client message in the server log. Which is a macro at the moment, but has to be turned into a real function for the reasons below. To unify escape_quotes(), pg_malloc() need to be also unified. But the diverse requirements for pg_malloc() from different executables means that both the escape_quotes() and the pg_malloc() interface need to be extended with the exit function they have to call and the argument to pass to the exit function. Unless we don't care about the bug reports about unexpected EOF from client messages. Frankly, it doesn't worth the effort. Let's just keep the two separate copies of escape_quotes(). BTW, it seems to me that this unify even the least used functions mentality was not considered to be a great feature during the introduction of pg_malloc(), which is used in far more code than the TAR functions. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Suggestion for --truncate-tables to pg_restore
Hi Josh, On 11/20/2012 11:53:23 PM, Josh Kupershmidt wrote: Hi Karl, I signed on to review this patch for the current CF. I noticed. Thanks very much. On Fri, Sep 21, 2012 at 8:54 AM, Karl O. Pinc k...@meme.com wrote: On 09/20/2012 12:24:49 PM, Karl O. Pinc wrote: First, the problem: Begin with the following structure: CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); CREATE VIEW schemaB.bar AS SELECT * FROM schemaA.foo; Then, by accident, somebody does: UPDATE schemaA.foo SET data = data + (RANDOM() * 1000)::INT; So, you want to restore the data into schemaA.foo. But schemaA.foo has (bad) data in it that must first be removed. It would seem that using pg_restore --clean -n schemaA -t foo my_pg_dump_backup would solve the problem, it would drop schemaA.foo, recreate it, and then restore the data. But this does not work. schemaA.foo does not drop because it's got a dependent database object, schemaB.bar. TBH, I didn't find the example above particularly compelling for demonstrating the need for this feature. If you've just got one table with dependent views which needs to be restored, it's pretty easy to manually TRUNCATE and have pg_restore --data-only reload the table. (And easy enough to combine the truncate and restore into a single transaction in case anything goes wrong, if need be.) I was not clear in stating the problem. (But see below regarding foreign keys.) The dependent view was an example, but there can also be REFERENCES constraints and trigger related constraints involving other tables in other schemas. The easiest work-around I can think of here is destroying all the triggers and constraints, either everything in the whole db or doing some work to be more selective, truncating all the schema's tables. doing a data-only restore of the schema, and then pg_restore --data-only, and then re-creating all the triggers and constraints. But I'm willing to grant that this proposed feature is potentially as useful as existing restore-jiggering options like --disable-triggers. And I guess I could see that if you're really stuck having to perform a --data-only restore of many tables, this feature could come in handy. I think so. See above. So, the idea here is to be able to do a data-only restore, first truncating the data in the tables being restored to remove the existing corrupted data. The proposal is to add a --truncate-tables option to pg_restore. (I tested pg_restore with 9.1 and when --data-only is used --clean is ignored, it does not even produce a warning. This is arguably a bug.) +1 for having pg_restore bail out with an error if the user specifies --data-only --clean. By the same token, --clean and --truncate-tables together should also raise a not allowed error. OT: After looking at the code I found a number of conflicting option combinations are not tested for or rejected. I can't recall what they are now. The right way to fix these would be to send in a separate patch for each conflict all attached to one email/commitfest item? Or one patch that just gets adjusted until it's right? More serious objections were raised regarding semantics. What if, instead, the initial structure looked like: CREATE TABLE schemaA.foo (id PRIMARY KEY, data INT); CREATE TABLE schemaB.bar (id INT CONSTRAINT bar_on_foo REFERENCES foo , moredata INT); With a case like this, in most real-world situations, you'd have to use pg_restore with --disable-triggers if you wanted to use --data-only and --truncate-tables. The possibility of foreign key referential integrity corruption is obvious. Why would --disable-triggers be used in this example? I don't think you could use --truncate-tables to restore only table foo, because you would get: ERROR: cannot truncate a table referenced in a foreign key constraint (At least, I think so, not having tried with the actual patch.) You could use --disable-triggers when restoring bar. I tried it and you're quite right. (I thought I'd tried this before, but clearly I did not -- proving the utility of the review process. :-) My assumption was that since triggers were turned off that constraints, being triggers, would be off as well and therefore tables with foreign keys could be truncated. Obviously not, since the I get the above error. I just tried it. --disable-triggers does not disable constraints. Recovering some data and being left with referential integrity problems is better than having no data. Well, this is really a judgment call, and I have a hunch that many admins would actually choose none of the above. And I think this point gets to the crux of whether the --truncate-tables option will be useful as-is. Well, yes. None of the above is best. :) In my case I had munged the data in the one important schema and wanted to restore. The setting is an academic one
Re: [HACKERS] [PATCH] binary heap implementation
On 2012-11-20 22:55:52 -0500, Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: While I'm thinking about it, why are the fields of a binaryheap_node called key and value? That implies a semantics not actually used here. Maybe value1 and value2 instead? Yes, I discussed this with Andres earlier (and considered ptr and value or p1 and p2), but decided to wait for others to comment on the naming. I have no problem with value1 and value2, though I'd very slightly prefer d1 and d2 (d for Datum) now. d1/d2 would be fine with me. BTW, I probably missed some context upthread, but why do we have two fields at all? At least for the purposes of nodeMergeAppend, it would make a lot more sense for the binaryheap elements to be just single Datums, without all the redundant pointers to the MergeAppend node. The need to get at the comparison support info could be handled much more elegantly than that by providing a void * passthrough arg. That is, the heap creation function becomes binaryheap *binaryheap_allocate(size_t capacity, binaryheap_comparator compare, void *passthrough); and the comparator signature becomes typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough); For nodeMergeAppend, the Datums would be the slot indexes and the passthrough pointer could point to the MergeAppend node. This would make equal sense in a lot of other contexts, such as if you wanted a heap composed of tuples -- the info to compare tuples could be handled via the passthrough argument, rather than doubling the size of the heap in order to put that pointer into each heap element. If you have no use for the passthrough arg in a particular usage, just pass NULL. The passthrough argument definitely makes sense, I am all for adding it. The reasons I originally wanted to have two values was that it made it easy/possible for the comparison function to work without any pointer dereferences which was somewhat beneficial to performance. Completely without dereferences only worked on 64bit systems anyway though... With just one value I need an extra struct, but thats not too bad. So if you all prefer just having one value, I am ok with that. Who is updating the patch? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fwd: ERROR: volatile EquivalenceClass has no sortref
Hi All , When am trying to query a table temp_table1(sms_type varchar(20),sms_info varchar(100),sms_id integer) Query :: select sms_type,count(*) from temp_table1 group by 1 order by 2 desc; Then i got following errors , i dont know whats wrong in this . *ERROR: volatile EquivalenceClass has no sortref* -- --Regards Ranjeet R. Dhumal
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
Boszormenyi Zoltan wrote: The problem is that it calls pg_malloc() which is an executable-specific function. Some of the bin/* executables define it as calling exit(1) when malloc() returns NULL, some call it with exit(EXIT_FAILURE) which happens to be 1 but still can be different from the constant 1. Some executables only define pg_malloc0() but not pg_malloc(). It seems simpler to have the escape_quotes utility function in port just not use pg_malloc at all, have it return NULL or similar failure indicator when malloc() fails, and then the caller decides what to do. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
2012-11-21 14:19 keltezéssel, Alvaro Herrera írta: Boszormenyi Zoltan wrote: The problem is that it calls pg_malloc() which is an executable-specific function. Some of the bin/* executables define it as calling exit(1) when malloc() returns NULL, some call it with exit(EXIT_FAILURE) which happens to be 1 but still can be different from the constant 1. Some executables only define pg_malloc0() but not pg_malloc(). It seems simpler to have the escape_quotes utility function in port just not use pg_malloc at all, have it return NULL or similar failure indicator when malloc() fails, and then the caller decides what to do. $ find . -name *.c | xargs grep pg_malloc | grep -v ^pg_malloc | wc -l 277 Too much work for little gain. Also: - pg_upgrade calls pg_log(PG_FATAL, ...) - pgbench logs a line then calls exit(1) What I can imagine is to introduce a new src/port/ function, InitPostgresFrontend() or something like that. Every executable then calls this function first inside their main() it they want to use pg_malloc() from libpgport.a. This function would accept a pointer to a structure, and the struct contains the pointer to the exit function and the argument too call it with. Other data for different use cases can be added later if needed. This way, the pg_malloc() and friends can be unified and their call interfaces don't have to change. InitPostgresFrontend() needs to be added to only 9 places instead of changing 277 callers of pg_malloc() or pg_malloc0(). BTW, the unified pg_malloc() and friends must be inside #ifdef FRONTEND ... #endif to not leak into the backend code. Opinions? Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: Reduce palloc's in numeric operations
On 20.11.2012 21:44, Tom Lane wrote: init_var_from_num's header comment could use some more work. The statement that one must not modify the returned var is false in some sense, since for instance numeric_ceil() does that. The key is that you have to replace the digit buffer not modify it in-place, but most computational routines do that anyway. Also it might be worth pointing out explicitly that free_var() is not required unless the var is modified subsequently. (There are instances of both cases, and it might not be clear to the reader why.) Added those points to the comment. * get_str_from_var() no longer scribbles on its input. I noticed that it's always called with a dscale that comes from the input var itself. In other words, the rounding was unnecessary to begin with. Hmm, it may have been necessary once upon a time, but I agree getting rid of the rounding step is a win now. Suggest adding a comment though that the var is displayed to the number of digits indicated by its dscale. Looks good otherwise. Committed, thanks to everyone involved. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
Boszormenyi Zoltan wrote: 2012-11-21 14:19 keltezéssel, Alvaro Herrera írta: Boszormenyi Zoltan wrote: The problem is that it calls pg_malloc() which is an executable-specific function. Some of the bin/* executables define it as calling exit(1) when malloc() returns NULL, some call it with exit(EXIT_FAILURE) which happens to be 1 but still can be different from the constant 1. Some executables only define pg_malloc0() but not pg_malloc(). It seems simpler to have the escape_quotes utility function in port just not use pg_malloc at all, have it return NULL or similar failure indicator when malloc() fails, and then the caller decides what to do. $ find . -name *.c | xargs grep pg_malloc | grep -v ^pg_malloc | wc -l 277 Too much work for little gain. I probably wrote the above in a confusing way. I am not suggesting that pg_malloc is changed in any way. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
Boszormenyi Zoltan z...@cybertec.at writes: pg_basebackup needs pg_malloc() to call disconnect_and_exit(1) instead to quit cleanly and not leave an unexpected EOF from client message in the server log. Which is a macro at the moment, but has to be turned into a real function for the reasons below. man 2 atexit regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]
2012-11-21 15:29 keltezéssel, Tom Lane írta: Boszormenyi Zoltan z...@cybertec.at writes: pg_basebackup needs pg_malloc() to call disconnect_and_exit(1) instead to quit cleanly and not leave an unexpected EOF from client message in the server log. Which is a macro at the moment, but has to be turned into a real function for the reasons below. man 2 atexit Aww, crap. I knew I forgot about something. :-) Thanks. regards, tom lane -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut pete...@gmx.net wrote: On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: But, with the attached patch: rhaas=# create function xyz(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION rhaas=# select xyz(5); xyz - 5 (1 row) rhaas=# create table abc (a int); CREATE TABLE rhaas=# select lpad(a, 5, '0') from abc; lpad -- (0 rows) I continue to be of the opinion that allowing this second case to work is not desirable. 1. Why? 2. What's your counter-proposal? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Wed, Nov 21, 2012 at 6:08 AM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-20 22:55:52 -0500, Tom Lane wrote: Abhijit Menon-Sen a...@2ndquadrant.com writes: While I'm thinking about it, why are the fields of a binaryheap_node called key and value? That implies a semantics not actually used here. Maybe value1 and value2 instead? Yes, I discussed this with Andres earlier (and considered ptr and value or p1 and p2), but decided to wait for others to comment on the naming. I have no problem with value1 and value2, though I'd very slightly prefer d1 and d2 (d for Datum) now. d1/d2 would be fine with me. BTW, I probably missed some context upthread, but why do we have two fields at all? At least for the purposes of nodeMergeAppend, it would make a lot more sense for the binaryheap elements to be just single Datums, without all the redundant pointers to the MergeAppend node. The need to get at the comparison support info could be handled much more elegantly than that by providing a void * passthrough arg. That is, the heap creation function becomes binaryheap *binaryheap_allocate(size_t capacity, binaryheap_comparator compare, void *passthrough); and the comparator signature becomes typedef int (*binaryheap_comparator) (Datum a, Datum b, void *passthrough); For nodeMergeAppend, the Datums would be the slot indexes and the passthrough pointer could point to the MergeAppend node. This would make equal sense in a lot of other contexts, such as if you wanted a heap composed of tuples -- the info to compare tuples could be handled via the passthrough argument, rather than doubling the size of the heap in order to put that pointer into each heap element. If you have no use for the passthrough arg in a particular usage, just pass NULL. The passthrough argument definitely makes sense, I am all for adding it. The reasons I originally wanted to have two values was that it made it easy/possible for the comparison function to work without any pointer dereferences which was somewhat beneficial to performance. Completely without dereferences only worked on 64bit systems anyway though... With just one value I need an extra struct, but thats not too bad. So if you all prefer just having one value, I am ok with that. Who is updating the patch? I guess I'll take another whack at it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] C-function, don't load external dll file
Hi, Przemek Lisowski prze...@lisnet.info writes: HOW LOAD DLL AND USE MY EXTERNAL FUNCTION ? You need to declare it in SQL, maybe like this: create function public.transform(text) returns text as '$libdir/fservice', 'transform' language C; See also the LOAD command and the CREATE EXTENSION documentation for how to organise testing and shipping of your code. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
On 29 August 2012 23:39, Tom Lane t...@sss.pgh.pa.us wrote: The main downside I can see is that code that used to work is likely to stop working as soon as someone creates a potential overloading situation. Worse, the error message could be pretty confusing, since if you had been successfully calling f(smallint) with f(42), you'd get f(integer) does not exist, not something like f() is ambiguous, after adding f(float8) to the mix. This seems related to the confusing changes in regression test cases that I got in my experiments yesterday. This may be sufficient reason to reject the idea, since the very last thing we need in this area is any degradation in the relevance of the error messages. It would be useful if we issued a NOTICE when an ambiguity is introduced, rather than when using it. Like Bison's reporting of reduce conflicts. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows
Given we're calling to_lower() on a single byte in the code referred to, should we even be doing that when we have a multi-byte encoding and the high bit is set? Nobody responded to this, but I'm rather inclined to say we should not. Here's a simple patch to avoid this case. Comments? cheers andrew diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c index b8e2f71..1d76ce3 100644 --- a/src/backend/parser/scansup.c +++ b/src/backend/parser/scansup.c @@ -132,8 +132,10 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) { char *result; int i; + boolenc_is_single_byte; result = palloc(len + 1); + enc_is_single_byte = pg_database_encoding_max_length() == 1; /* * SQL99 specifies Unicode-aware case normalization, which we don't yet @@ -141,8 +143,8 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) * locale-aware translation. However, there are some locales where this * is not right either (eg, Turkish may do strange things with 'i' and * 'I'). Our current compromise is to use tolower() for characters with - * the high bit set, and use an ASCII-only downcasing for 7-bit - * characters. + * the high bit set, as long as they aren't part of a multi-byte character, + * and use an ASCII-only downcasing for 7-bit characters. */ for (i = 0; i len; i++) { @@ -150,7 +152,7 @@ downcase_truncate_identifier(const char *ident, int len, bool warn) if (ch = 'A' ch = 'Z') ch += 'a' - 'A'; - else if (IS_HIGHBIT_SET(ch) isupper(ch)) + else if (enc_is_single_byte IS_HIGHBIT_SET(ch) isupper(ch)) ch = tolower(ch); result[i] = (char) ch; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
Simon Riggs si...@2ndquadrant.com writes: It would be useful if we issued a NOTICE when an ambiguity is introduced, rather than when using it. I think that's pie in the sky, since whether there is an ambiguity will depend not only on what set of functions exists, but what the caller's search_path is. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows
Andrew Dunstan and...@dunslane.net writes: Here's a simple patch to avoid this case. Comments? I'm not sure that's the only place we're doing this ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows
On 11/21/2012 11:11 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: Here's a simple patch to avoid this case. Comments? I'm not sure that's the only place we're doing this ... Oh, Hmm, darn. Where else do you think we might? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PANIC: could not write to log file
After upgrading a pretty big database (serveral hundred gig) from 8.2 to 9.2 I'm getting some PANIC: could not write to log file messages. Actually I got two of them. One yesterday and one today. * postgres 9.2.1 is running on a windows 2003 server (downloaded the zip archive binaries) * Data is on an iscsi san (the same postgres 8.2 was running on perfectly since several years) * Full messages are : PANIC: could not write to log file 118, segment 74 at offset 12189696, length 475136: Invalid argument STATEMENT: COMMIT PANIC: could not write to log file 117, segment 117 at offset 5660672, length 4096000: Invalid argument STATEMENT: COMMIT * I have no other messages from the san nor from windows * Here are the non default parameters in postgresql.conf listen_addresses = '*' # what IP address(es) to listen on; max_connections = 200 # (change requires restart) shared_buffers = 320MB # min 128kB work_mem = 50MB# min 64kB maintenance_work_mem = 700MB # min 1MB checkpoint_segments = 15 # in logfile segments, min 1, 16MB each Any idea on what might be going on ? Cyril VELTER -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Database object names and libpq in UTF-8 locale on Windows
Andrew Dunstan and...@dunslane.net writes: On 11/21/2012 11:11 AM, Tom Lane wrote: I'm not sure that's the only place we're doing this ... Oh, Hmm, darn. Where else do you think we might? Dunno, but grepping for isupper and/or tolower should find any such places. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PANIC: could not write to log file
On Wed, Nov 21, 2012 at 8:51 AM, Cyril VELTER cyril.vel...@metadys.com wrote: After upgrading a pretty big database (serveral hundred gig) from 8.2 to 9.2 I'm getting some PANIC: could not write to log file messages. Actually I got two of them. One yesterday and one today. How was the upgrade done? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote: I guess I'll take another whack at it. New version attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company binaryheap-rmh2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] autovacuum stress-testing our system
On Sun, Nov 18, 2012 at 5:49 PM, Tomas Vondra t...@fuzzy.cz wrote: The two main changes are these: (1) The stats file is split into a common db file, containing all the DB Entries, and per-database files with tables/functions. The common file is still called pgstat.stat, the per-db files have the database OID appended, so for example pgstat.stat.12345 etc. This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile functions, introducing two new functions: pgstat_read_db_statsfile pgstat_write_db_statsfile that do the trick of reading/writing stat file for one database. (2) The pgstat_read_statsfile has an additional parameter onlydbs that says that you don't need table/func stats - just the list of db entries. This is used for autovacuum launcher, which does not need to read the table/stats (if I'm reading the code in autovacuum.c correctly - it seems to be working as expected). I'm not an expert on the stats system, but this seems like a promising approach to me. (a) It does not solve the many-schema scenario at all - that'll need a completely new approach I guess :-( We don't need to solve every problem in the first patch. I've got no problem kicking this one down the road. (b) It does not solve the writing part at all - the current code uses a single timestamp (last_statwrite) to decide if a new file needs to be written. That clearly is not enough for multiple files - there should be one timestamp for each database/file. I'm thinking about how to solve this and how to integrate it with pgstat_send_inquiry etc. Presumably you need a last_statwrite for each file, in a hash table or something, and requests need to specify which file is needed. And yet another one I'm thinking about is using a fixed-length array of timestamps (e.g. 256), indexed by mod(dboid,256). That would mean stats for all databases with the same mod(oid,256) would be written at the same time. Seems like an over-engineering though. That seems like an unnecessary kludge. (c) I'm a bit worried about the number of files - right now there's one for each database and I'm thinking about splitting them by type (one for tables, one for functions) which might make it even faster for some apps with a lot of stored procedures etc. But is the large number of files actually a problem? After all, we're using one file per relation fork in the base directory, so this seems like a minor issue. I don't see why one file per database would be a problem. After all, we already have on directory per database inside base/. If the user has so many databases that dirent lookups in a directory of that size are a problem, they're already hosed, and this will probably still work out to a net win. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On 2012-11-21 12:54:30 -0500, Robert Haas wrote: On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote: I guess I'll take another whack at it. New version attached. I think the assert in replace_first should be Assert(!binaryheap_empty(heap) heap-has_heap_property); instead of Assert(heap-has_heap_property); looks good otherwise. Thanks, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Switching timeline over streaming replication
On 20.11.2012 15:33, Amit Kapila wrote: Defect-2: 1. start primary A 2. start standby B following A 3. start cascade standby C following B. 4. Start another standby D following C. 5. Execute the following commands in the primary A. create table tbl(f int); insert into tbl values(generate_series(1,1000)); 6. Promote standby B. 7. Execute the following commands in the primary B. insert into tbl values(generate_series(1001,2000)); insert into tbl values(generate_series(2001,3000)); The following logs are observed on standby C: LOG: restarted WAL streaming at position 0/700 on tli 2 ERROR: requested WAL segment 00020007 has already been removed LOG: record with zero length at 0/7028190 LOG: record with zero length at 0/7048540 LOG: out-of-sequence timeline ID 1 (after 2) in log segment 00020007, offset 0 Hmm, this one is actually a pre-existing bug. There's a sanity check that the sequence of timeline IDs that are seen in the XLOG page headers doesn't go backwards. In other words, if the last XLOG page that was read had timeline id X, the next page must have a tli = X. The startup process keeps track of the last seen timeline id in lastPageTLI. In standby_mode, when the startup process is reading from a pre-existing file in pg_xlog (typically put there by streaming replication) and it reaches the end of valid WAL (marked by an error in decoding it, ie. record with zero length in your case), it sleeps for five seconds and retries. At retry, the WAL file is re-opened, and as part of sanity checking it, the first page header in the file is validated. Now, if there was a timeline change in the current WAL segment, and we've already replayed past that point, lastPageTLI will already be set to the new TLI, but the first page on the file contains the old TLI. When the file is re-opened, and the first page is validated, you get the error. The fix is quite straightforward: we should refrain from checking the TLI when we re-open a WAL file. Or better yet, compare it against the TLI we saw at the beginning of the last WAL segment, not the last WAL page. I propose the attached patch (against 9.2) to fix that. This should be backpatched to 9.0, where standby_mode was introduced. The code was the same in 8.4, too, but AFAICS there was no problem there because 8.4 never tried to re-open the same WAL segment after replaying some of it. - Heikki diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8614907..045d21d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -572,6 +572,7 @@ static uint32 readRecordBufSize = 0; static XLogRecPtr ReadRecPtr; /* start of last record read */ static XLogRecPtr EndRecPtr; /* end+1 of last record read */ static TimeLineID lastPageTLI = 0; +static TimeLineID lastSegmentTLI = 0; static XLogRecPtr minRecoveryPoint; /* local copy of * ControlFile-minRecoveryPoint */ @@ -655,7 +656,7 @@ static void CleanupBackupHistory(void); static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force); static XLogRecord *ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt); static void CheckRecoveryConsistency(void); -static bool ValidXLOGHeader(XLogPageHeader hdr, int emode); +static bool ValidXLOGHeader(XLogPageHeader hdr, int emode, bool segmentonly); static XLogRecord *ReadCheckpointRecord(XLogRecPtr RecPtr, int whichChkpt); static List *readTimeLineHistory(TimeLineID targetTLI); static bool existsTimeLineHistory(TimeLineID probeTLI); @@ -3927,7 +3928,7 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) * to go backwards (but we can't reset that variable right here, since * we might not change files at all). */ - lastPageTLI = 0; /* see comment in ValidXLOGHeader */ + lastPageTLI = lastSegmentTLI = 0; /* see comment in ValidXLOGHeader */ randAccess = true; /* allow curFileTLI to go backwards too */ } @@ -4190,7 +4191,7 @@ next_record_is_invalid: * ReadRecord. It's not intended for use from anywhere else. */ static bool -ValidXLOGHeader(XLogPageHeader hdr, int emode) +ValidXLOGHeader(XLogPageHeader hdr, int emode, bool segmentonly) { XLogRecPtr recaddr; @@ -4285,18 +4286,31 @@ ValidXLOGHeader(XLogPageHeader hdr, int emode) * successive pages of a consistent WAL sequence. * * Of course this check should only be applied when advancing sequentially - * across pages; therefore ReadRecord resets lastPageTLI to zero when - * going to a random page. + * across pages; therefore ReadRecord resets lastPageTLI and + * lastSegmentTLI to zero when going to a random page. + * + * Sometimes we re-open a segment that's already been partially replayed. + * In that case we cannot perform the normal TLI check: if there is a + * timeline switch within the segment, the first
[HACKERS] PQconninfo function for libpq
I'm breaking this out into it's own thread, for my own sanity if nothing else :) And it's an isolated feature after all. I still agree with the previous review at http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net about keeping the data in more than one place. Based on that, I've created a different version of this patch, attached. This way we keep all the data in one struct. At this point, the patch is untested beyond compiling and a trivial psql check, because I ran out of time :) But I figured I'd throw it out there for comments on which version people prefer. (And yes, it's quite possible I've made a big think-mistake in it somewhere, but again, better to get some eyes on it early) My version also contains a fixed version of the docs that should be moved back into Zoltans version if that's the one we end up preferring. Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? Attached is both Zoltans latest patch (v16) and my slightly different approach. Comments on which approach is best? Test results from somebody who has the time to look at it? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ 01-PQconninfo-v16.patch Description: Binary data PQconninfo-mha.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
Robert Haas escribió: On Wed, Nov 21, 2012 at 9:46 AM, Robert Haas robertmh...@gmail.com wrote: I guess I'll take another whack at it. New version attached. The following comments still talk about key and value, thus need an update: + * binaryheap_add_unordered + * + * Adds the given key and value to the end of the heap's list of nodes +/* + * binaryheap_add + * + * Adds the given key and value to the heap in O(log n), while +/* + * binaryheap_replace_first + * + * Change the key and/or value of the first (root, topmost) node and This comment needs updated (s/comparator/compare/, and also add has_heap_property and arg): +/* + * binaryheap + * + * sizehow many nodes are currently in nodes + * space how many nodes can be stored in nodes + * comparator comparator to define the heap property + * nodes the first of a list of space nodes + */ +typedef struct binaryheap +{ + int size; + int space; + boolhas_heap_property; /* debugging cross-check */ + binaryheap_comparator compare; + void *arg; + Datum nodes[FLEXIBLE_ARRAY_MEMBER]; +} binaryheap; I would suggest to add prefixes to struct members, so bh_size, bh_space and so on. This makes it easier to grep for stuff later. Do we really need the struct definition be public? Couldn't it just live in binaryheap.c? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[Re] Re: [HACKERS] PANIC: could not write to log file
On 21/11/12 18:10:12, mailto:jeff.ja...@gmail.com wrote: On Wed, Nov 21, 2012 at 8:51 AM, Cyril VELTER cyril.vel...@metadys.com wrote: After upgrading a pretty big database (serveral hundred gig) from 8.2 to 9.2 I'm getting some PANIC: could not write to log file messages. Actually I got two of them. One yesterday and one today. How was the upgrade done? The upgrade was done with the following steps : * create a new cluster using 9.2.1 initdb * run pg_dump | psql using the 9.2.1 binairies on a linux machines (both servers the old 8.2 and the new 9.2.1 running on windows machines). Thanks, cyril -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User control over psql error stream
On 11/15/12 3:53 PM, Karl O. Pinc wrote: This patch gives the end user control over psql's error stream. This allows a single psql session to use \o to send both errors and table output to multiple files. Useful when capturing test output, etc. What does this do that cannot in practice be achieved using shell redirection operators? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add -Wlogical-op to standard compiler options?
On Thu, Nov 15, 2012 at 1:46 PM, Peter Eisentraut pete...@gmx.net wrote: On 11/15/12 9:40 AM, Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: I think it might be worth adding -Wlogical-op to the standard warning options (for supported compilers, determined by configure test). Does that add any new warnings with the current source code, and if so what? none Using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), I get dozens of warnings, all apparently coming from somewhere in the MemSet macro. example: pl_handler.c:301: warning: logical '' with non-zero constant will always evaluate as true Probably has something to do with: /* \ * If MEMSET_LOOP_LIMIT == 0, optimizer should find \ * the whole if false at compile time. \ */ \ MEMSET_LOOP_LIMIT != 0) \ Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: The following comments still talk about key and value, thus need an update: Oops. This comment needs updated (s/comparator/compare/, and also add has_heap_property and arg): Fixed. I would suggest to add prefixes to struct members, so bh_size, bh_space and so on. This makes it easier to grep for stuff later. OK. Do we really need the struct definition be public? Couldn't it just live in binaryheap.c? Well, that would preclude defining any macros, like binaryheap_empty(). Since efficiency is among the reasons for inventing this in the first place, that doesn't seem prudent to me. Another new version attached. P.S. to Abhijit: Please, please tell me the secret to getting five people to review your patches. I'm lucky when I can get one! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company binaryheap-rmh3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes
On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote: Could ALTER TABLE use an option to drop the primary key constraint? I needed to do that, found it was not obvious, and this lead me to try to improve things. That could be useful, I think. But it might open a can of worms. Would the new option be syntactic sugar around ALTER TABLE ... DROP CONSTRAINT put_the_name_of_the_primary_key_here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] WIP json generation enhancements
Here is a WIP patch for enhancements to json generation. First, there is the much_requested json_agg, which will aggregate rows directly to json. So the following will now work: select json_agg(my_table) from mytable; select json_agg(q) from (myquery here) q; One open question regarding this feature is whether this should return NULL or '[]' for 0 rows. Currently it returns NULL but I could be convinced to return '[]', and the change would be very small. Next is to_json(), which will turn any value into json, so we're no longer restricted to rows and arrays. Non-builtin types are now searched for a cast to json, and if it exists it is used instead of the type's text representation. I didn't add a special type to look for a cast to, as was discussed before, as it seemed a bit funky and unnecessary. It can easily be added, but I'm still not convinced it's a good idea. Note that this is only done for types that aren't builtin - we know how to turn all of those into json without needing to look for a cast. Along with this there is an hstore_to_json() function added to the hstore module, and a cast from hstore to json that uses it. This function treats every value in the hstore as a string. There is also a function with the working title of hstore_to_json_loose() that does a heuristic conversion that treats values of 't' and 'f' as booleans, and strings that look like numbers as numbers unless they start with a leading 0 followed by another digit (could be zip codes, phone numbers etc.) The difference between these is illustrated here (notice that quoted 't' becomes unquoted 'true' and quoted '1' becomes '1'): andrew=# select json_agg(q) from foo q; json_agg - [{a:a,b:1,h:{c: t, d: null, q: 1, x: y}}] (1 row) andrew=# select json_agg(q) from (select a, b, hstore_to_json_loose(h) as h from foo) q; json_agg [{a:a,b:1,h:{c: true, d: null, q: 1, x: y}}] (1 row) Note: this patch will need a change in the oids used for the new functions if applied against git tip, as they have been overtaken by time. Comments welcome. cheers andrew *** a/contrib/hstore/hstore--1.1.sql --- b/contrib/hstore/hstore--1.1.sql *** *** 234,239 LANGUAGE C IMMUTABLE STRICT; --- 234,252 CREATE CAST (text[] AS hstore) WITH FUNCTION hstore(text[]); + CREATE FUNCTION hstore_to_json(hstore) + RETURNS json + AS 'MODULE_PATHNAME', 'hstore_to_json' + LANGUAGE C IMMUTABLE STRICT; + + CREATE CAST (hstore AS json) + WITH FUNCTION hstore_to_json(hstore); + + CREATE FUNCTION hstore_to_json_loose(hstore) + RETURNS json + AS 'MODULE_PATHNAME', 'hstore_to_json_loose' + LANGUAGE C IMMUTABLE STRICT; + CREATE FUNCTION hstore(record) RETURNS hstore AS 'MODULE_PATHNAME', 'hstore_from_record' *** a/contrib/hstore/hstore_io.c --- b/contrib/hstore/hstore_io.c *** *** 8,14 --- 8,17 #include access/htup_details.h #include catalog/pg_type.h #include funcapi.h + #include lib/stringinfo.h #include libpq/pqformat.h + #include utils/builtins.h + #include utils/json.h #include utils/lsyscache.h #include utils/typcache.h *** *** 1209,1211 hstore_send(PG_FUNCTION_ARGS) --- 1212,1425 PG_RETURN_BYTEA_P(pq_endtypsend(buf)); } + + + /* + * hstore_to_json_loose + * + * This is a heuristic conversion to json which treats + * 't' and 'f' as booleans and strings that look like numbers as numbers, + * as long as they don't start with a leading zero followed by another digit + * (think zip codes or phone numbers starting with 0). + */ + PG_FUNCTION_INFO_V1(hstore_to_json_loose); + Datum hstore_to_json_loose(PG_FUNCTION_ARGS); + Datum + hstore_to_json_loose(PG_FUNCTION_ARGS) + { + HStore *in = PG_GETARG_HS(0); + int buflen, + i; + int count = HS_COUNT(in); + char *out, + *ptr; + char *base = STRPTR(in); + HEntry *entries = ARRPTR(in); + boolis_number; + StringInfo src, dst; + + if (count == 0) + { + out = palloc(1); + *out = '\0'; + PG_RETURN_TEXT_P(cstring_to_text(out)); + } + + buflen = 3; + + /* + * Formula adjusted slightly from the logic in hstore_out. + * We have to take account of out treatment of booleans + * to be a bit more pessimistic about the length of values. + */ + + for (i = 0; i count; i++) + { + /* include and colon-space and comma-space */ + buflen += 6 + 2 * HS_KEYLEN(entries, i); + /* include only if nonnull */ + buflen += 3 + (HS_VALISNULL(entries, i) + ? 1 + : 2 * HS_VALLEN(entries, i)); + } + + out = ptr = palloc(buflen); + + src = makeStringInfo(); + dst = makeStringInfo(); + + *ptr++ = '{'; + + for (i = 0; i count; i++) + { + resetStringInfo(src); +
Re: [HACKERS] [PATCH] binary heap implementation
On 2012-11-21 15:09:12 -0500, Robert Haas wrote: On Wed, Nov 21, 2012 at 1:30 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: The following comments still talk about key and value, thus need an update: Oops. This comment needs updated (s/comparator/compare/, and also add has_heap_property and arg): Fixed. I would suggest to add prefixes to struct members, so bh_size, bh_space and so on. This makes it easier to grep for stuff later. OK. Do we really need the struct definition be public? Couldn't it just live in binaryheap.c? Well, that would preclude defining any macros, like binaryheap_empty(). Since efficiency is among the reasons for inventing this in the first place, that doesn't seem prudent to me. Another new version attached. One very minor nitpick I unfortunately just found now, not sure when that changed: binaryheap_replace_first() hardcodes the indices for the left/right node of the root node. I would rather have it use (left|right)_offset(0). P.S. to Abhijit: Please, please tell me the secret to getting five people to review your patches. I'm lucky when I can get one! Heh. This was rather surprising, yes. Its a rather easy patch to review in a way, you don't need much pg internals knowledge for it... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array exclusion constraint
On Sat, Nov 17, 2012 at 1:05 PM, Philip Taylor philiptaylo...@yahoo.com wrote: CREATE TABLE foo ( x CHAR(32) PRIMARY KEY, y CHAR(32) NOT NULL, EXCLUDE USING gist ((ARRAY[x, y]) WITH ) ); My first thought was you were going to have better luck with text rather than char(n), but a little bit of experimentation suggests to me that that doesn't work either. It seems that GIN doesn't support exclusion constraints and there's no gist opclass for text[], varchar[], or anyarray. Bummer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQconninfo function for libpq
Hi, 2012-11-21 19:19 keltezéssel, Magnus Hagander írta: I'm breaking this out into it's own thread, for my own sanity if nothing else :) And it's an isolated feature after all. I still agree with the previous review at http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net about keeping the data in more than one place. OK, it seems I completely missed that comment. (Or forgot about it if I happened to answer it.) Based on that, I've created a different version of this patch, attached. This way we keep all the data in one struct. I like this single structure but not the way you handle the options' classes. In your patch, each option belongs to only one class. These classes are: PG_CONNINFO_REPLICATION = replication only PG_CONNINFO_PASSWORD = password only PG_CONNINFO_NORMAL = everything else How does it help pg_basebackup to filter out e.g. dbname and replication? These are added by the walreceiver module anyway and adding them to the primary_conninfo line should even be discouraged by the documentation. In my view, the classes should be inclusive: PG_CONNINFO_NORMAL = Everything that's usable for a regular client connection. This mean everything, maybe including password but excluding replication. PG_CONNINFO_PASSWORD = password only. PG_CONNINFO_REPLICATION = Everything usable for a replication client not added by walreceiver. Maybe including/excluding password. Maybe there should be two flags for replication usage: PG_CONNINFO_WALRECEIVER = everything except those not added by walreceiver (and maybe password too) PG_CONNINFO_REPLICATION = replication only And every option can belong to more than one class, just as in my patch. At this point, the patch is untested beyond compiling and a trivial psql check, because I ran out of time :) But I figured I'd throw it out there for comments on which version people prefer. (And yes, it's quite possible I've made a big think-mistake in it somewhere, but again, better to get some eyes on it early) My version also contains a fixed version of the docs that should be moved back into Zoltans version if that's the one we end up preferring. I also liked your version of the documentation better, I am not too good at writing docs. Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? Attached is both Zoltans latest patch (v16) and my slightly different approach. Comments on which approach is best? Test results from somebody who has the time to look at it? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ Thanks for four work on this. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PQconninfo function for libpq
On Wed, Nov 21, 2012 at 10:01 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, 2012-11-21 19:19 keltezéssel, Magnus Hagander írta: I'm breaking this out into it's own thread, for my own sanity if nothing else :) And it's an isolated feature after all. I still agree with the previous review at http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net about keeping the data in more than one place. OK, it seems I completely missed that comment. (Or forgot about it if I happened to answer it.) Based on that, I've created a different version of this patch, attached. This way we keep all the data in one struct. I like this single structure but not the way you handle the options' classes. In your patch, each option belongs to only one class. These classes are: PG_CONNINFO_REPLICATION = replication only PG_CONNINFO_PASSWORD = password only PG_CONNINFO_NORMAL = everything else How does it help pg_basebackup to filter out e.g. dbname and replication? PG_CONNINFO_ALL should give pg_basebackup all it needs, no? Or actually, it shouldn't have the replication=1 part, right? So it should just use PG_CONNINFO_NORMAL|PG_CONNINFO_PASSWORD? These are added by the walreceiver module anyway and adding them to the primary_conninfo line should even be discouraged by the documentation. Hmm. I wasn't actually thinking about the dbname part here, I admit that. In my view, the classes should be inclusive: PG_CONNINFO_NORMAL = Everything that's usable for a regular client connection. This mean everything, maybe including password but excluding replication. PG_CONNINFO_PASSWORD = password only. PG_CONNINFO_REPLICATION = Everything usable for a replication client not added by walreceiver. Maybe including/excluding password. Maybe there should be two flags for replication usage: PG_CONNINFO_WALRECEIVER = everything except those not added by walreceiver (and maybe password too) PG_CONNINFO_REPLICATION = replication only And every option can belong to more than one class, just as in my patch. Hmm. I kind of liked having each option in just one class, but I see the problem. Looking at the ones you suggest, all the non-password ones *have* to be without password, otherwise there i no way to get the conninfo without password - which is the whole reason for that parameter to exist. So the PASSWORD one has to be additional - which means that not making the other ones additional makes them inconsistent. But maybe we don't really have a choice there. At this point, the patch is untested beyond compiling and a trivial psql check, because I ran out of time :) But I figured I'd throw it out there for comments on which version people prefer. (And yes, it's quite possible I've made a big think-mistake in it somewhere, but again, better to get some eyes on it early) My version also contains a fixed version of the docs that should be moved back into Zoltans version if that's the one we end up preferring. I also liked your version of the documentation better, I am not too good at writing docs. np. Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? Just going to highlight that we're looking for at least one third party to comment on this :) Attached is both Zoltans latest patch (v16) and my slightly different approach. Comments on which approach is best? Test results from somebody who has the time to look at it? :) Do you happen to have a set of tests you've been running on your patches? Can you try them again this one? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for hint bit i/o mitigation
On 11/16/12 9:03 AM, Merlin Moncure wrote: Atri ran some quick n dirty tests to see if there were any regressions. He benched a large scan followed by vacuum. So far, results are inconclusive so better testing methodologies will definitely be greatly appreciated. One of the challenges with working in this part of the code is that it's quite difficult to make changes without impacting at least some workloads. I'm adding something to pgbench-tools to test for some types of vacuum regressions that I've seen before. And the checksum benchmarking I've already signed up for overlaps with this one quite a bit. I'd suggest reviewers here focus on code quality, correctness, and targeted optimization testing. I'm working heavily on a larger benchmarking framework that both this and checksums will fit into right now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] StrategyGetBuffer questions
On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure mmonc...@gmail.com wrote: In this sprawling thread on scaling issues [1], the topic meandered into StrategyGetBuffer() -- in particular the clock sweep loop. I'm wondering: *) If there shouldn't be a a bound in terms of how many candidate buffers you're allowed to skip for having a non-zero usage count. Whenever an unpinned usage_count0 buffer is found, trycounter is reset (!) so that the code operates from point of view as it had just entered the loop. There is an implicit assumption that this is rare, but how rare is it? How often is that the trycounter would hit zero if it were not reset? I've instrumented something like that in the past, and could only get it to fire under pathologically small shared_buffers and workloads that caused most of them to be pinned simultaneously. well, it's basically impossible -- and that's what I find odd. *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds itself examining too many unpinned buffers per sweep? It is a self correcting problem. If it is examining a lot of unpinned buffers, it is also decrementing a lot of them. sure. but it's entirely plausible that some backends are marking up usage_count rapidly and not allocating buffers while others are doing a lot of allocations. point being: all it takes is one backend to get scheduled out while holding the freelist lock to effectively freeze the database for many operations. it's been documented [1] that particular buffers can become spinlock contention hot spots due to reference counting of the pins. if a lot of allocation is happening concurrently it's only a matter of time before the clock sweep rolls around to one of them, hits the spinlock, and (in the worst case) schedules out. this could in turn shut down the clock sweep for some time and non allocating backends might then beat on established buffers and pumping up usage counts. The reference counting problem might be alleviated in some fashion for example via Robert's idea to disable reference counting under contention [2]. Even if you do that. you're still in for a world of hurt if you get scheduled out of a buffer allocation. Your patch fixes that AFAICT. The buffer pin check is outside the wider lock, making my suggestion to be less rigorous about usage_count a lot less useful (but perhaps not completely useless!). Another innovation might be to implement a 'trylock' variant of LockBufHdr that does a TAS but doesn't spin -- if someone else has the header locked, why bother waiting for it? just skip to the next and move on. .. merlin [1] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01557.php [2] http://archives.postgresql.org/pgsql-hackers/2012-05/msg01571.php -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MySQL search query is not executing in Postgres DB
On 11/21/12 9:42 AM, Robert Haas wrote: On Mon, Nov 19, 2012 at 6:19 PM, Peter Eisentraut pete...@gmx.net wrote: On Tue, 2012-11-06 at 10:57 -0500, Robert Haas wrote: But, with the attached patch: rhaas=# create function xyz(smallint) returns smallint as $$select $1$$ language sql; CREATE FUNCTION rhaas=# select xyz(5); xyz - 5 (1 row) rhaas=# create table abc (a int); CREATE TABLE rhaas=# select lpad(a, 5, '0') from abc; lpad -- (0 rows) I continue to be of the opinion that allowing this second case to work is not desirable. 1. Why? Because a strongly-typed system should not cast numbers to strings implicitly. Does the equivalent of the lpad case work in any other strongly-typed programming language? 2. What's your counter-proposal? Leave things as they are. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Do we need so many hint bits?
On Mon, 2012-11-19 at 14:46 -0500, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: As I said elsewhere in the thread, I'm not planning to introduce any additional locking. There is already precedent in IndexOnlyNext. Of course, that just begs the question of whether the code in IndexOnlyNext is correct. And after looking at it, it seems pretty broken, or at least the argument for its correctness is broken. That comment seems to consider only the case where the target tuple has just been inserted --- but what of the case where the target tuple is in process of being deleted? The deleter will not make a new index entry for that. Of course, there's a pretty long code path from marking a tuple deleted to committing the deletion, and it seems safe to assume that the deleter will hit a write barrier in that path. But I don't believe the argument here that the reader must hit a read barrier in between. After digging in a little, this is bothering me now, too. I think it's correct, and I agree with your analysis, but it seems a little complex to explain why it works. It also seems like a fortunate coincidence that we can detect clearing the bit due to an insert, which we need to know about immediately; but not detect a delete, which we don't care about until it commits. I will try to update the comment along with my submission. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort memory usage: grow_memtuples
On Thu, Nov 15, 2012 at 11:16 AM, Robert Haas robertmh...@gmail.com wrote: So what's next here? Do you want to work on these issue some more? Or does Jeff? This has been rewritten enough that I no longer feel much ownership of it. I'd prefer to leave it to Peter or Greg S., if they are willing to do it. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On Thu, Nov 15, 2012 at 9:13 AM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-15 09:06:23 -0800, Jeff Janes wrote: On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com wrote: --- src/bin/Makefile| 2 +- src/bin/xlogdump/Makefile | 25 +++ src/bin/xlogdump/xlogdump.c | 468 3 files changed, 494 insertions(+), 1 deletion(-) create mode 100644 src/bin/xlogdump/Makefile create mode 100644 src/bin/xlogdump/xlogdump.c Is this intended to be the successor of https://github.com/snaga/xlogdump which will then be deprecated? As-is this is just a development tool which was sorely needed for the development of this patchset. But yes I think that once ready (xlogreader infrastructure, *_desc routines splitted) it should definitely be able to do most of what the above xlogdump can do and it should live in bin/. I think mostly some filtering is missing. That doesn't really deprecate the above though. Does that answer your question? Yes, I think so. Thanks. (I've just recently gotten the original xlogdump to work for me in 9.2, and I had been wonder if back-porting yours to 9.2 would have been an easier way to go.) Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH 03/14] Add simple xlogdump tool
On 2012-11-21 14:57:14 -0800, Jeff Janes wrote: On Thu, Nov 15, 2012 at 9:13 AM, Andres Freund and...@2ndquadrant.com wrote: On 2012-11-15 09:06:23 -0800, Jeff Janes wrote: On Wed, Nov 14, 2012 at 5:17 PM, Andres Freund and...@2ndquadrant.com wrote: --- src/bin/Makefile| 2 +- src/bin/xlogdump/Makefile | 25 +++ src/bin/xlogdump/xlogdump.c | 468 3 files changed, 494 insertions(+), 1 deletion(-) create mode 100644 src/bin/xlogdump/Makefile create mode 100644 src/bin/xlogdump/xlogdump.c Is this intended to be the successor of https://github.com/snaga/xlogdump which will then be deprecated? As-is this is just a development tool which was sorely needed for the development of this patchset. But yes I think that once ready (xlogreader infrastructure, *_desc routines splitted) it should definitely be able to do most of what the above xlogdump can do and it should live in bin/. I think mostly some filtering is missing. That doesn't really deprecate the above though. Does that answer your question? Yes, I think so. Thanks. (I've just recently gotten the original xlogdump to work for me in 9.2, and I had been wonder if back-porting yours to 9.2 would have been an easier way to go.) I don't think you would have much fun doing so - the WAL format changes between 9.2 and 9.3 make this larger than one might think. I had a version that worked with the previous format but there have been some interface changes since then... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] auto_explain WAS: RFC: Timing Events
On 11/8/12 2:16 PM, Josh Berkus wrote: Also, logging only the long-running queries is less useful than people on this list seem to think. When I'm doing real performance analysis, I need to see *everything* which was run, not just the slow stuff. Often the real problem is a query which used to take 1.1ms, now takes 1.8ms, and gets run 400 times/second. Looking just at the slow queries won't tell you that. No argument here. I've tried to be clear that some of these high-speed but lossy things I'm hacking on are not going to be useful for everyone. A solution that found most of these 1.8ms queries, but dropped some percentage under the highest peak load conditions, would still be very useful to me. An anecdote on this topic seems relevant. I have a troublesome production server that has moved log_min_duration_statement from 100ms to 500ms to 2000ms as the system grew. Even the original setting wasn't short enough to catch everything we would like to watch *now*, but seeing sub-second data is a dream at this point. The increases have been forced by logging contention becoming unmanagable when every client has to fight for the log to write to disk. I can see the buggers stack up as waiting for locks if I try to log shorter statements, stalling enough that it drags the whole server down under peak load. If I could just turn off logging just during those periods--basically, throwing them away only when some output rate throttled component hit its limit--I could still find them in the data collected the rest of the time. There are some types of problems that also only occur under peak load that this idea would miss, but you'd still be likely to get *some* of them, statistically. There's a really hard trade-off here: -Sometimes you must save data on every query to capture fine details -Making every query wait for disk I/O is impractical The sort of ideas you threw out for making things like auto-explain logging per-session can work. The main limitation there though is that it presumes you even know what area the problem is in the first place. I am not optimistic that covers a whole lot of ground either. Upthread I talked about a background process that persists shared memory queues as a future consumer of timing events--one that might consume slow query data too. That is effectively acting as the ideal component I described above, one that only loses things when it exceeds the system's write capacity for saving them. I wouldn't want to try and retrofit the existing PostgreSQL logging facility for such a thing though. Log parsing as the way to collect data is filled with headaches anyway. I don't see any other good way to resolve this trade-off. To help with the real-world situation you describe, ideally you dump all the query data somewhere, fast, and have the client move on. You can't make queries wait for storage, something else (with a buffer!) needs to take over that job. I can't find the URL right now, but at PG.EU someone was showing me a module that grabbed the new 9.2 logging hook and shipped the result to another server. Clients burn a little CPU and network time and they move on, and magically disk I/O goes out of their concerns. How much overhead persisting the data takes isn't the database server's job at all then. That's the sort of flexibility people need to have with logging eventually. Something so efficient that every client can afford to do it; it is capable of saving all events under ideal conditions; but under adverse ones, you have to keep going and accept the loss. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
On 2012-11-21 18:35:34 +0900, Michael Paquier wrote: On Wed, Nov 21, 2012 at 4:34 PM, Andres Freund and...@2ndquadrant.comwrote: On 2012-11-21 14:57:08 +0900, Michael Paquier wrote: Ah, I see. Could you try the following diff? diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index df24b33..797a126 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder, Snapstate *snapstate, */ snapstate-transactions_after = buf-origptr; + snapstate-nrrunning = running-xcnt; snapstate-xmin_running = InvalidTransactionId; snapstate-xmax_running = InvalidTransactionId; I am still getting the same assertion failure even with this diff included. I really don't understand whats going on here then. Youve said you made sure that there is a catalog snapshot. Which means you would need something like: WARNING: connecting to postgres WARNING: Initiating logical rep LOG: computed new xmin: 16566894 LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 WARNING: reached consistent point, stopping! WARNING: Starting logical replication LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 in the log *and* it means that snapbuild-state has to be CONSISTENT. But the backtrace youve posted: #3 0x0070c409 in SnapBuildTxnIsRunning (snapstate=0x19e4f10,xid=0) at snapbuild.c:877 #4 0x0070b8e4 in SnapBuildProcessChange (reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368, relfilenode=0x1a0a450) at snapbuild.c:388 #5 0x0070c088 in SnapBuildDecodeCallback (reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732 shows pretty clearly that snapstate *can't* be consistent because line 387ff is: else if (snapstate-state SNAPBUILD_CONSISTENT SnapBuildTxnIsRunning(snapstate, xid)) ; so #3 #4 can't happen at those line numbers with state == CONSISTENT. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] auto_explain WAS: RFC: Timing Events
On 22/11/12 12:15, Greg Smith wrote: On 11/8/12 2:16 PM, Josh Berkus wrote: Also, logging only the long-running queries is less useful than people on this list seem to think. When I'm doing real performance analysis, I need to see *everything* which was run, not just the slow stuff. Often the real problem is a query which used to take 1.1ms, now takes 1.8ms, and gets run 400 times/second. Looking just at the slow queries won't tell you that. No argument here. I've tried to be clear that some of these high-speed but lossy things I'm hacking on are not going to be useful for everyone. A solution that found most of these 1.8ms queries, but dropped some percentage under the highest peak load conditions, would still be very useful to me. An anecdote on this topic seems relevant. I have a troublesome production server that has moved log_min_duration_statement from 100ms to 500ms to 2000ms as the system grew. Even the original setting wasn't short enough to catch everything we would like to watch *now*, but seeing sub-second data is a dream at this point. The increases have been forced by logging contention becoming unmanagable when every client has to fight for the log to write to disk. I can see the buggers stack up as waiting for locks if I try to log shorter statements, stalling enough that it drags the whole server down under peak load. If I could just turn off logging just during those periods--basically, throwing them away only when some output rate throttled component hit its limit--I could still find them in the data collected the rest of the time. There are some types of problems that also only occur under peak load that this idea would miss, but you'd still be likely to get *some* of them, statistically. There's a really hard trade-off here: -Sometimes you must save data on every query to capture fine details -Making every query wait for disk I/O is impractical The sort of ideas you threw out for making things like auto-explain logging per-session can work. The main limitation there though is that it presumes you even know what area the problem is in the first place. I am not optimistic that covers a whole lot of ground either. Upthread I talked about a background process that persists shared memory queues as a future consumer of timing events--one that might consume slow query data too. That is effectively acting as the ideal component I described above, one that only loses things when it exceeds the system's write capacity for saving them. I wouldn't want to try and retrofit the existing PostgreSQL logging facility for such a thing though. Log parsing as the way to collect data is filled with headaches anyway. I don't see any other good way to resolve this trade-off. To help with the real-world situation you describe, ideally you dump all the query data somewhere, fast, and have the client move on. You can't make queries wait for storage, something else (with a buffer!) needs to take over that job. I can't find the URL right now, but at PG.EU someone was showing me a module that grabbed the new 9.2 logging hook and shipped the result to another server. Clients burn a little CPU and network time and they move on, and magically disk I/O goes out of their concerns. How much overhead persisting the data takes isn't the database server's job at all then. That's the sort of flexibility people need to have with logging eventually. Something so efficient that every client can afford to do it; it is capable of saving all events under ideal conditions; but under adverse ones, you have to keep going and accept the loss. Would it be useful to be able to specify a fixed sampling rate, say 1 in 100. That way you could get a well defined statistical sample, yet not cause excessive I/O? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v3
On Thu, Nov 22, 2012 at 8:25 AM, Andres Freund and...@2ndquadrant.comwrote: I really don't understand whats going on here then. Youve said you made sure that there is a catalog snapshot. Which means you would need something like: WARNING: connecting to postgres WARNING: Initiating logical rep LOG: computed new xmin: 16566894 LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 WARNING: reached consistent point, stopping! WARNING: Starting logical replication LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 in the log *and* it means that snapbuild-state has to be CONSISTENT. But the backtrace youve posted: #3 0x0070c409 in SnapBuildTxnIsRunning (snapstate=0x19e4f10,xid=0) at snapbuild.c:877 #4 0x0070b8e4 in SnapBuildProcessChange (reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368, relfilenode=0x1a0a450) at snapbuild.c:388 #5 0x0070c088 in SnapBuildDecodeCallback (reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732 shows pretty clearly that snapstate *can't* be consistent because line 387ff is: else if (snapstate-state SNAPBUILD_CONSISTENT SnapBuildTxnIsRunning(snapstate, xid)) ; so #3 #4 can't happen at those line numbers with state == CONSISTENT. Still this *impossible* thing happens. Here are some more information on the logs I get on server side: Yes I have the logical replication correctly initialized: [629 0] LOG: database system was shut down at 2012-11-22 09:02:42 JST [628 0] LOG: database system is ready to accept connections [633 0] LOG: autovacuum launcher started [648 0] WARNING: connecting to postgres [648 0] WARNING: Initiating logical rep [648 0] LOG: computed new xmin: 684 [648 0] LOG: start reading from 0/178C1B8, scrolled back to 0/178C000 And I am also getting logs of this type with pg_receivellog: BEGIN 698 table pgbench_accounts: UPDATE: aid[int4]:759559 bid[int4]:8 abalance[int4]:-3641 filler[bpchar]: table pgbench_tellers: UPDATE: tid[int4]:93 bid[int4]:10 tbalance[int4]:-3641 filler[bpchar]:(null) table pgbench_branches: UPDATE: bid[int4]:10 bbalance[int4]:-3641 filler[bpchar]:(null) table pgbench_history: INSERT: tid[int4]:93 bid[int4]:10 aid[int4]:759559 delta[int4]:-3641 mtime[timestamp]:2012-11-22 09:05:34.535651 filler[bpchar]:(null) COMMIT 698 Until the assertion failure: TRAP: FailedAssertion(!(((xid) = ((TransactionId) 3)) ((snapstate-xmin_running) = ((TransactionId) 3))), File: snapbuild.c, Line: 878) I still have the core file and its binary at hand if you want, so can send them at will. I have not been able to read your code yet, but there should be something you are missing. Thanks, -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] logical changeset generation v3
On 2012-11-22 09:13:30 +0900, Michael Paquier wrote: On Thu, Nov 22, 2012 at 8:25 AM, Andres Freund and...@2ndquadrant.comwrote: I really don't understand whats going on here then. Youve said you made sure that there is a catalog snapshot. Which means you would need something like: WARNING: connecting to postgres WARNING: Initiating logical rep LOG: computed new xmin: 16566894 LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 WARNING: reached consistent point, stopping! WARNING: Starting logical replication LOG: start reading from 3/E62457C0, scrolled back to 3/E6244000 LOG: found initial snapshot (via running xacts). Done: 1 in the log *and* it means that snapbuild-state has to be CONSISTENT. But the backtrace youve posted: #3 0x0070c409 in SnapBuildTxnIsRunning (snapstate=0x19e4f10,xid=0) at snapbuild.c:877 #4 0x0070b8e4 in SnapBuildProcessChange (reorder=0x19e4e80,snapstate=0x19e4f10, xid=0, buf=0x1a0a368, relfilenode=0x1a0a450) at snapbuild.c:388 #5 0x0070c088 in SnapBuildDecodeCallback (reorder=0x19e4e80,snapstate=0x19e4f10, buf=0x1a0a368) at snapbuild.c:732 shows pretty clearly that snapstate *can't* be consistent because line 387ff is: else if (snapstate-state SNAPBUILD_CONSISTENT SnapBuildTxnIsRunning(snapstate, xid)) ; so #3 #4 can't happen at those line numbers with state == CONSISTENT. Still this *impossible* thing happens. Here are some more information on the logs I get on server side: Yes I have the logical replication correctly initialized: [629 0] LOG: database system was shut down at 2012-11-22 09:02:42 JST [628 0] LOG: database system is ready to accept connections [633 0] LOG: autovacuum launcher started [648 0] WARNING: connecting to postgres [648 0] WARNING: Initiating logical rep [648 0] LOG: computed new xmin: 684 [648 0] LOG: start reading from 0/178C1B8, scrolled back to 0/178C000 Ok, so youve not yet reached a consistent point. Which means this shouldn't yet be written out: And I am also getting logs of this type with pg_receivellog: BEGIN 698 table pgbench_accounts: UPDATE: aid[int4]:759559 bid[int4]:8 abalance[int4]:-3641 filler[bpchar]: table pgbench_tellers: UPDATE: tid[int4]:93 bid[int4]:10 tbalance[int4]:-3641 filler[bpchar]:(null) table pgbench_branches: UPDATE: bid[int4]:10 bbalance[int4]:-3641 filler[bpchar]:(null) table pgbench_history: INSERT: tid[int4]:93 bid[int4]:10 aid[int4]:759559 delta[int4]:-3641 mtime[timestamp]:2012-11-22 09:05:34.535651 filler[bpchar]:(null) COMMIT 698 that could already be good enough of a hint, let me check tomorrow. I still have the core file and its binary at hand if you want, so can send them at will. If those aren't too big, its worth a try... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Removing PD_ALL_VISIBLE
Follow up to discussion: http://archives.postgresql.org/pgsql-hackers/2012-11/msg00817.php I worked out a patch that replaces PD_ALL_VISIBLE with calls to visibilitymap_test. It rips out a lot of complexity, with a net drop of about 300 lines (not a lot, but some of that code is pretty complex). The patch is quite rough, and I haven't yet added the code to keep the VM buffer pins around longer, but I still don't see any major problems. I'm fairly certain that scans will not be adversely affected, and I think that inserts/updates/deletes should be fine as well. The main worry I have with inserts/updates/deletes is that there will be less spatial locality for allocating new buffers or for modifying existing buffers. So keeping a pinned VM buffer might not help as much, because it might need to pin a different one, anyway. Adding to January commitfest, as it's past 11/15. Regards, Jeff Davis rm-pd-all-visible-20121121.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] binary heap implementation
At 2012-11-21 15:09:12 -0500, robertmh...@gmail.com wrote: The following comments still talk about key and value, thus need an update: Oops. In the same vein, Returns NULL if the heap is empty no longer applies to binaryheap_first and binaryheap_remove_first. Plus there is an extra asterisk in the middle of the comment about binaryheap_replace_first. But it looks good otherwise. I agree with Tom that we should support resizing the heap, but let's not bother with that now. I'll write the few extra lines of code the first time something actually needs to add more elements to a heap. -- Abhijit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] StrategyGetBuffer questions
On Thursday, November 22, 2012 3:26 AM Merlin Moncure wrote: On Tue, Nov 20, 2012 at 4:50 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 20, 2012 at 1:26 PM, Merlin Moncure mmonc...@gmail.com wrote: In this sprawling thread on scaling issues [1], the topic meandered into StrategyGetBuffer() -- in particular the clock sweep loop. I'm wondering: *) If there shouldn't be a a bound in terms of how many candidate buffers you're allowed to skip for having a non-zero usage count. Whenever an unpinned usage_count0 buffer is found, trycounter is reset (!) so that the code operates from point of view as it had just entered the loop. There is an implicit assumption that this is rare, but how rare is it? How often is that the trycounter would hit zero if it were not reset? I've instrumented something like that in the past, and could only get it to fire under pathologically small shared_buffers and workloads that caused most of them to be pinned simultaneously. well, it's basically impossible -- and that's what I find odd. *) Shouldn't StrategyGetBuffer() bias down usage_count if it finds itself examining too many unpinned buffers per sweep? It is a self correcting problem. If it is examining a lot of unpinned buffers, it is also decrementing a lot of them. sure. but it's entirely plausible that some backends are marking up usage_count rapidly and not allocating buffers while others are doing a lot of allocations. point being: all it takes is one backend to get scheduled out while holding the freelist lock to effectively freeze the database for many operations. True, even this is observed by me, the case I have tried for this was that when 50% of buffers are always used for some hot table(table in high demand) and rest 50% of buffers are used for normal ops. In such cases contention can be observed for BufFreelistLock. The same can be seen in the Report I have attached in below mail. http://archives.postgresql.org/pgsql-hackers/2012-11/msg01147.php it's been documented [1] that particular buffers can become spinlock contention hot spots due to reference counting of the pins. if a lot of allocation is happening concurrently it's only a matter of time before the clock sweep rolls around to one of them, hits the spinlock, and (in the worst case) schedules out. this could in turn shut down the clock sweep for some time and non allocating backends might then beat on established buffers and pumping up usage counts. The reference counting problem might be alleviated in some fashion for example via Robert's idea to disable reference counting under contention [2]. Even if you do that. you're still in for a world of hurt if you get scheduled out of a buffer allocation. Your patch fixes that AFAICT. The buffer pin check is outside the wider lock, making my suggestion to be less rigorous about usage_count a lot less useful (but perhaps not completely useless!). Another innovation might be to implement a 'trylock' variant of LockBufHdr that does a TAS but doesn't spin -- if someone else has the header locked, why bother waiting for it? just skip to the next and move on. .. I think this is reasonable idea. How about having Hot and Cold end in the buffer list, so that if some of the buffers are heavily used, then other backends might not need to pay penality for traversing Hot Buffers. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] User control over psql error stream
On 11/21/2012 01:41:56 PM, Peter Eisentraut wrote: On 11/15/12 3:53 PM, Karl O. Pinc wrote: This patch gives the end user control over psql's error stream. This allows a single psql session to use \o to send both errors and table output to multiple files. Useful when capturing test output, etc. What does this do that cannot in practice be achieved using shell redirection operators? I've 3 answers to this. The short one, the long one, and the practical one. Sorry to go on about it. The brief generic answer is: Easily capture error output associated with disparate input streams, interspersed with the data output associated with the disparate input streams, in separate files correlated with the input, in the case where transactions span multiple input streams. That latter bit about transactions being the important part. Of course I could be wrong/not clever enough. The long generic answer is: Shell redirection works fine if the transactions do not span input streams; a separate connection/psql instance can be used to redirect stderr intermixed with stdout as desired. However when transactions span input streams then a single psql instance/db connection is required to maintain transaction state. In such a case it is difficult to re-redirect stderr so that it's output appears intermixed with table data output (stdout) in time wise order while at the same time redirecting all output into files that correlate with the various input streams. It's doable with something like: psql 21 myfifo | mydemultiplexer where myfifo is a fifo fed the various input streams and mydemultiplexer is a some process that splits stdin into various output files as various bits of input are sent to psql. But there must be a way to communicate with the mydemultiplexer, and you need to worry about synchronization of input with output. And it's not clear to me how to write mydemultiplexer in shell either. Note that I've not actually tried the example above. The practical answer is that I want my dependency based test framework to work: http://gombemi.ccas.gwu.edu/cgi-bin/darcsweb.cgi?r=gombemi;a=tree;f=/ db/test For an intro see: http://gombemi.ccas.gwu.edu/cgi-bin/darcsweb.cgi? r=gombemi;a=headblob;f=/db/test/test.mk I'm using this patch in my test framework. If I had to I think I could get away without it, but it would involve a bit of work, and more than a bit of additional complication. Although I've got (much) further work to do if I'm to follow through on my notion to release this as a generic test framework, what's documented in test.mk works. Probably. (Not bad for under 300 lines of actual code, comments excluded. Perhaps under 200 if you eliminate blank lines and configuration for my real-world use.) Comments/suggestions are welcome. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes
On 11/21/2012 02:12:26 PM, Robert Haas wrote: On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote: Could ALTER TABLE use an option to drop the primary key constraint? I needed to do that, found it was not obvious, and this lead me to try to improve things. That could be useful, I think. But it might open a can of worms. Would the new option be syntactic sugar around ALTER TABLE ... DROP CONSTRAINT put_the_name_of_the_primary_key_here? This sounds nice to me, but there's worms left over because the unique index created when PRIMARY KEY is specified would then remain. This the right behavior IMHO, and if everything is spelled out in the documentation no problems should arise. But the user deserves to know how to get rid of the unique index too, so the index's name would need to be documented. Since this is something of an internal matter (?) there might be another worm here. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP json generation enhancements
On 11/21/2012 03:16 PM, Andrew Dunstan wrote: Here is a WIP patch for enhancements to json generation. First, there is the much_requested json_agg, which will aggregate rows directly to json. So the following will now work: select json_agg(my_table) from mytable; select json_agg(q) from (myquery here) q; One open question regarding this feature is whether this should return NULL or '[]' for 0 rows. Currently it returns NULL but I could be convinced to return '[]', and the change would be very small. Next is to_json(), which will turn any value into json, so we're no longer restricted to rows and arrays. Non-builtin types are now searched for a cast to json, and if it exists it is used instead of the type's text representation. I didn't add a special type to look for a cast to, as was discussed before, as it seemed a bit funky and unnecessary. It can easily be added, but I'm still not convinced it's a good idea. Note that this is only done for types that aren't builtin - we know how to turn all of those into json without needing to look for a cast. Along with this there is an hstore_to_json() function added to the hstore module, and a cast from hstore to json that uses it. This function treats every value in the hstore as a string. There is also a function with the working title of hstore_to_json_loose() that does a heuristic conversion that treats values of 't' and 'f' as booleans, and strings that look like numbers as numbers unless they start with a leading 0 followed by another digit (could be zip codes, phone numbers etc.) The difference between these is illustrated here (notice that quoted 't' becomes unquoted 'true' and quoted '1' becomes '1'): andrew=# select json_agg(q) from foo q; json_agg - [{a:a,b:1,h:{c: t, d: null, q: 1, x: y}}] (1 row) andrew=# select json_agg(q) from (select a, b, hstore_to_json_loose(h) as h from foo) q; json_agg [{a:a,b:1,h:{c: true, d: null, q: 1, x: y}}] (1 row) Note: this patch will need a change in the oids used for the new functions if applied against git tip, as they have been overtaken by time. Comments welcome. Updated patch that works with git tip and has regression tests. cheers andrew *** a/contrib/hstore/expected/hstore.out --- b/contrib/hstore/expected/hstore.out *** *** 1453,1455 select count(*) from testhstore where h = 'pos=98, line=371, node=CBA, indexe --- 1453,1491 1 (1 row) + -- json + select hstore_to_json('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4'); + hstore_to_json + - + {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1} + (1 row) + + select cast( hstore 'a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4' as json); + json + - + {b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1} + (1 row) + + select hstore_to_json_loose('a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4'); +hstore_to_json_loose + -- + {b: true, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1} + (1 row) + + create table test_json_agg (f1 text, f2 hstore); + insert into test_json_agg values ('rec1','a key =1, b = t, c = null, d= 12345, e = 012345, f= 1.234, g= 2.345e+4'), +('rec2','a key =2, b = f, c = null, d= -12345, e = 012345.6, f= -1.234, g= 0.345e-4'); + select json_agg(q) from test_json_agg q; + json_agg + + [{f1:rec1,f2:{b: t, c: null, d: 12345, e: 012345, f: 1.234, g: 2.345e+4, a key: 1}}, + + {f1:rec2,f2:{b: f, c: null, d: -12345, e: 012345.6, f: -1.234, g: 0.345e-4, a key: 2}}] + (1 row) + + select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_json_agg) q; +json_agg + -- + [{f1:rec1,f2:{b: true, c: null, d: 12345, e: 012345,
Re: [HACKERS] User control over psql error stream
On 11/21/2012 01:41:56 PM, Peter Eisentraut wrote: On 11/15/12 3:53 PM, Karl O. Pinc wrote: This patch gives the end user control over psql's error stream. This allows a single psql session to use \o to send both errors and table output to multiple files. Useful when capturing test output, etc. What does this do that cannot in practice be achieved using shell redirection operators? To look at it from another angle, you need it for the same reason you need \o -- to redirect output to multiple places from within a single psql process. \o redirects stdout, but I'm interested in redirecting stderr. Which makes me think that instead of a \pset option a, say, \e, command is called for. This would do the same thing \o does only for stderr instead of stdout. The problem comes when you want \e to send to the same place \o sends to, or vice versa. You can't just compare textual paths because there's more than one way to write a path to an object in the filesystem. You need either a special argument to \e and \o -- very ungood as it breaks backwards compatibility with \o -- or you need, perhaps, yet another \ command to send stderr to where \o is going or, conversely, send \o to where stderr is going. Perhaps\oe could send \o output to where stderr is going at the moment and \eo could send stderr output to where \o output is going at the moment? Sorry to ramble on. I'm tired. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW for PostgreSQL
On Wed, Nov 21, 2012 at 7:31 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote: At execute_query(), it stores the retrieved rows onto tuplestore of festate-tuples at once. Doesn't it make problems when remote- table has very big number of rows? No. postgres_fdw uses single-row processing mode of libpq when retrieving query results in execute_query, so memory usage will be stable at a certain level. IIRC, the previous code used cursor feature to fetch a set of rows to avoid over-consumption of local memory. Do we have some restriction if we fetch a certain number of rows with FETCH? It seems to me, we can fetch 1000 rows for example, and tentatively store them onto the tuplestore within one PG_TRY() block (so, no need to worry about PQclear() timing), then we can fetch remote rows again when IterateForeignScan reached end of tuplestore. As you say, postgres_fdw had used cursor to avoid possible memory exhaust on large result set. I switched to single-row processing mode (it could be said protocol-level cursor), which was added in 9.2, because it accomplish same task with less SQL calls than cursor. Regards, -- Shigeru HANADA
Re: [HACKERS] Doc patch: Document names of automatically created constraints and indexes
On 11/21/2012 10:00:11 PM, Karl O. Pinc wrote: On 11/21/2012 02:12:26 PM, Robert Haas wrote: On Sat, Nov 17, 2012 at 1:22 AM, Peter Eisentraut pete...@gmx.net wrote: On Mon, 2012-11-12 at 11:42 -0600, Karl O. Pinc wrote: Could ALTER TABLE use an option to drop the primary key constraint? I needed to do that, found it was not obvious, and this lead me to try to improve things. That could be useful, I think. But it might open a can of worms. Would the new option be syntactic sugar around ALTER TABLE ... DROP CONSTRAINT put_the_name_of_the_primary_key_here? This sounds nice to me, No, wait. If constraint name_of_primary_key is an internal and is to change over time, how do you deal with dropping, now, a primary key constraint that was created, then, before some change to the internal name. And you wouldn't want to accidentally remove a user-created constraint that just happened to have the same name as the internal primary key constraint name, especially in the case where there's a real primary key constraint created under an old naming convention. Changes to the primary key constraint naming convention would have to require a db dump/restore on pg upgrade to the new version, or something else that changes the primary key constraint names. (And then I'm not sure what would happen if a user was, before upgrading, using a constraint name that was the new default.) And the changing of the internal constraint name would have had to have always previously caused a name change in existing pg dbs or else dbs created long ago could today have primary key constraint names following old conventions, raising the concern at top. I'm sorry to waste your time if these are obvious issues. Regards, Karl k...@meme.com Free Software: You don't pay back, you pay forward. -- Robert A. Heinlein -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management
On Mon, Nov 19, 2012 at 8:52 PM, Amit kapila amit.kap...@huawei.com wrote: On Monday, November 19, 2012 5:53 AM Jeff Janes wrote: On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote: Run the modes in reciprocating order? Sorry, I didn't understood this, What do you mean by modes in reciprocating order? Sorry for the long delay. In your scripts, it looks like you always run the unpatched first, and then the patched second. Yes, thats true. By reciprocating, I mean to run them in the reverse order, or in random order. Today for some configurations, I have ran by reciprocating the order. Below are readings: Configuration 16GB (Database) -7GB (Shared Buffers) Here i had run in following order 1. Run perf report with patch for 32 client 2. Run perf report without patch for 32 client 3. Run perf report with patch for 16 client 4. Run perf report without patch for 16 client Each execution is 5 minutes, 16 client /16 thread| 32 client /32 thread @mv-free-lst @9.3devl| @mv-free-lst @9.3devl --- 36694056| 53565258 39874121| 46255185 48404574| 45026796 64656932| 45588233 69667222| 49558237 75517219| 91158269 83157168| 431718340 91027136| 579208349 --- 63626054| 167757333 Sorry, I haven't followed this thread at all, but the numbers (43171 and 57920) in the last two runs of @mv-free-list for 32 clients look aberrations, no ? I wonder if that's skewing the average. I also looked at the the Results.htm file down thread. There seem to be a steep degradation when the shared buffers are increased from 5GB to 10GB, both with and without the patch. Is that expected ? If so, isn't that worth investigating and possibly even fixing before we do anything else ? Thanks, Pavan
Re: [HACKERS] PQconninfo function for libpq
2012-11-21 22:15 keltezéssel, Magnus Hagander írta: On Wed, Nov 21, 2012 at 10:01 PM, Boszormenyi Zoltan z...@cybertec.at wrote: Hi, 2012-11-21 19:19 keltezéssel, Magnus Hagander írta: I'm breaking this out into it's own thread, for my own sanity if nothing else :) And it's an isolated feature after all. I still agree with the previous review at http://archives.postgresql.org/message-id/1349321071.23971.0.ca...@vanquo.pezone.net about keeping the data in more than one place. OK, it seems I completely missed that comment. (Or forgot about it if I happened to answer it.) Based on that, I've created a different version of this patch, attached. This way we keep all the data in one struct. I like this single structure but not the way you handle the options' classes. In your patch, each option belongs to only one class. These classes are: PG_CONNINFO_REPLICATION = replication only PG_CONNINFO_PASSWORD = password only PG_CONNINFO_NORMAL = everything else How does it help pg_basebackup to filter out e.g. dbname and replication? PG_CONNINFO_ALL should give pg_basebackup all it needs, no? Or actually, it shouldn't have the replication=1 part, right? So it should just use PG_CONNINFO_NORMAL|PG_CONNINFO_PASSWORD? These are added by the walreceiver module anyway and adding them to the primary_conninfo line should even be discouraged by the documentation. Hmm. I wasn't actually thinking about the dbname part here, I admit that. And not only the dbname, libpqwalreceiver adds these three: [zozo@localhost libpqwalreceiver]$ grep dbname * libpqwalreceiver.c: %s dbname=replication replication=true fallback_application_name=walreceiver, I also excluded application_name from PG_CONNINFO_REPLICATION by this reasoning: - for async replication or single standby, it doesn't matter, the connection will show up as walreceiver - for sync replication, the administrator has to add the node name manually via application_name. In my view, the classes should be inclusive: PG_CONNINFO_NORMAL = Everything that's usable for a regular client connection. This mean everything, maybe including password but excluding replication. PG_CONNINFO_PASSWORD = password only. PG_CONNINFO_REPLICATION = Everything usable for a replication client not added by walreceiver. Maybe including/excluding password. Maybe there should be two flags for replication usage: PG_CONNINFO_WALRECEIVER = everything except those not added by walreceiver (and maybe password too) PG_CONNINFO_REPLICATION = replication only And every option can belong to more than one class, just as in my patch. Hmm. I kind of liked having each option in just one class, but I see the problem. Looking at the ones you suggest, all the non-password ones *have* to be without password, otherwise there i no way to get the conninfo without password - which is the whole reason for that parameter to exist. So the PASSWORD one has to be additional - which means that not making the other ones additional makes them inconsistent. But maybe we don't really have a choice there. Yes, the PASSWORD part can be on its own, this is what I meant above but wanted a different opinion about having it completely separate is better or not. But the NORMAL and REPLICATION (or WALRECEIVER) classes need to overlap. At this point, the patch is untested beyond compiling and a trivial psql check, because I ran out of time :) But I figured I'd throw it out there for comments on which version people prefer. (And yes, it's quite possible I've made a big think-mistake in it somewhere, but again, better to get some eyes on it early) My version also contains a fixed version of the docs that should be moved back into Zoltans version if that's the one we end up preferring. I also liked your version of the documentation better, I am not too good at writing docs. np. Also, a question was buried in the other review which is - are we OK to remove the requiressl parameter. Both these patches do so, because the code becomes much simpler if we can do that. It has been deprecated since 7.2. Is it OK to remove it, or do we need to put back in the more complex code to deal with both? Just going to highlight that we're looking for at least one third party to comment on this :) Yes, me too. A +1 for removing wouldn't count from me. ;-) Attached is both Zoltans latest patch (v16) and my slightly different approach. Comments on which approach is best? Test results from somebody who has the time to look at it? :) Do you happen to have a set of tests you've been running on your patches? Can you try them again this one? My set of tests are: 1. initdb the master 2. pg_basebackup -R the first standby from the master 3. pg_basebackup -R the second standby from the master 4. pg_basebackup -R the third standby from the first standby and diff -durpN the different data directories while there is no load on either. I will test it today after fixing the classes in your patch. ;-)