Re: [HACKERS] PL/pgSQL, RAISE and error context
On 1/26/15 1:14 PM, Pavel Stehule wrote: 2015-01-26 13:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: I can see where it's a lot nicer not to have the context visible for people who only care about the contents of the message, but the way it's done in PL/PgSQL right now is just not good enough. On the other hand, the backwards compatibility breakage of doing this in libpq is quite extensive. The most simple option seems to be to just allow a GUC to change PL/PgSQL's behavior to match what all other PLs are doing. libpq was changed more time - there is still a open task about a protocol change. I afraid about some unexpected side effects of your proposal if somebody mix languages - these side effects should not be critical I have no idea what you're talking about. What kind of side effects? - but on second hand current behave is not critical too - we can wait. I think the current behavior is almost unacceptable. It makes debugging in some cases really, really difficult. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
On 1/26/15 1:44 PM, Pavel Stehule wrote: 2015-01-26 13:39 GMT+01:00 Marko Tiikkaja ma...@joh.to: On 1/26/15 1:14 PM, Pavel Stehule wrote: I afraid about some unexpected side effects of your proposal if somebody mix languages - these side effects should not be critical I have no idea what you're talking about. What kind of side effects? what will be a error context if plpgsql calls a plperl function that raises a exception what will be a error context if plperl calls a plpgsql functions that raises a exception I fail to see the point. How would that be different from what happens today? Remember, PL/PgSQL only suppresses the *topmost* stack frame, and only when using RAISE from within a PL/PgSQL function. .m -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix typo in guc.c
On 01/26/2015 02:56 PM, Sawada Masahiko wrote: Hi, Attached patch fixes the typo in guc.c. It's typo, I think. diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index f6df077..f4f1965 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3880,7 +3880,7 @@ build_guc_variables(void) } /* -* Create table with 20% slack +* Create table with 25% slack */ size_vars = num_vars + num_vars / 4; No, I think that's intentional. After the creation, indeed 20% of the table is empty. For example, if num_vars is 100, size_vars is 125. And 100/125 = 0.80. In add_guc_variable, where the table is enlarged, it says: /* * Increase the vector by 25% */ int size_vars = size_guc_variables + size_guc_variables / 4; That's correct too. The table is enlarged by 25%, so after the operation, 20% of it is again free. Subtle ;-) (Although I don't think increase is the correct term here. Should be enlarge, or increase the *size* of the vector by 25%.) - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-01-26 13:39 GMT+01:00 Marko Tiikkaja ma...@joh.to: On 1/26/15 1:14 PM, Pavel Stehule wrote: 2015-01-26 13:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: I can see where it's a lot nicer not to have the context visible for people who only care about the contents of the message, but the way it's done in PL/PgSQL right now is just not good enough. On the other hand, the backwards compatibility breakage of doing this in libpq is quite extensive. The most simple option seems to be to just allow a GUC to change PL/PgSQL's behavior to match what all other PLs are doing. libpq was changed more time - there is still a open task about a protocol change. I afraid about some unexpected side effects of your proposal if somebody mix languages - these side effects should not be critical I have no idea what you're talking about. What kind of side effects? what will be a error context if plpgsql calls a plperl function that raises a exception what will be a error context if plperl calls a plpgsql functions that raises a exception - but on second hand current behave is not critical too - we can wait. I think the current behavior is almost unacceptable. It makes debugging in some cases really, really difficult. if it is necessary, then we can modify libpq Regards Pavel .marko
[HACKERS] fix typo in guc.c
Hi, Attached patch fixes the typo in guc.c. It's typo, I think. Regards, --- Sawada Masahiko fix_typo_guc_c.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On 01/23/2015 02:18 AM, Noah Misch wrote: On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote: The following case has just been brought to my attention (look at the differing number of backslashes): andrew=# select jsonb '\\u'; jsonb -- \u (1 row) andrew=# select jsonb '\u'; jsonb -- \u (1 row) A mess indeed. The input is unambiguous, but the jsonb representation can't distinguish \u from \\u. Some operations on the original json type have similar problems, since they use an in-memory binary representation with the same shortcoming: [local] test=# select json_array_element_text($$[\u]$$, 0) = test-# json_array_element_text($$[\\u]$$, 0); ?column? -- t (1 row) Things get worse, though. On output, '\uabcd' for any four hex digits is recognized as a unicode escape, and thus the backslash is not escaped, so that we get: andrew=# select jsonb '\\uabcd'; jsonb -- \uabcd (1 row) We could probably fix this fairly easily for non- U+ cases by having jsonb_to_cstring use a different escape_json routine. Sounds reasonable. For 9.4.1, before more people upgrade? But it's a mess, sadly, and I'm not sure what a good fix for the U+ case would look like. Agreed. When a string unescape algorithm removes some kinds of backslash escapes and not others, it's nigh inevitable that two semantically-distinct inputs can yield the same output. json_lex_string() fell into that trap by making an exception for \u. To fix this, the result needs to be fully unescaped (\u converted to the NUL byte) or retain all backslash escapes. (Changing that either way is no fun now that an on-disk format is at stake.) Maybe we should detect such input and emit a warning of ambiguity? It's likely to be rare enough, but clearly not as rare as we'd like, since this is a report from the field. Perhaps. Something like WARNING: jsonb cannot represent \\u; reading as \u? Alas, but I do prefer that to silent data corruption. Maybe something like this patch. I have two concerns about it, though. The first is the possible performance impact of looking for the offending string in every jsonb input, and the second is that the test isn't quite right, since input of \\\u doesn't raise this issue - i.e. the problem arises when u is preceded by an even number of backslashes. For the moment, maybe I could commit the fix for the non U+ case for escape_json, and we could think some more about detecting and warning about the problem strings. cheers andrew diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c index 3c137ea..8d2b38f 100644 --- a/src/backend/utils/adt/json.c +++ b/src/backend/utils/adt/json.c @@ -94,6 +94,8 @@ static void datum_to_json(Datum val, bool is_null, StringInfo result, static void add_json(Datum val, bool is_null, StringInfo result, Oid val_type, bool key_scalar); static text *catenate_stringinfo_string(StringInfo buffer, const char *addon); +static void escape_json_work(StringInfo buf, const char *str, + bool jsonb_unicode); /* the null action object used for pure validation */ static JsonSemAction nullSemAction = @@ -2356,6 +2358,18 @@ json_object_two_arg(PG_FUNCTION_ARGS) void escape_json(StringInfo buf, const char *str) { + escape_json_work( buf, str, false); +} + +void +escape_jsonb(StringInfo buf, const char *str) +{ + escape_json_work( buf, str, true); +} + +static void +escape_json_work(StringInfo buf, const char *str, bool jsonb_unicode) +{ const char *p; appendStringInfoCharMacro(buf, '\'); @@ -2398,7 +2412,9 @@ escape_json(StringInfo buf, const char *str) * unicode escape that should be present is \u, all the * other unicode escapes will have been resolved. */ -if (p[1] == 'u' +if (jsonb_unicode strncmp(p+1, u, 5) == 0) + appendStringInfoCharMacro(buf, *p); +else if (!jsonb_unicode p[1] == 'u' isxdigit((unsigned char) p[2]) isxdigit((unsigned char) p[3]) isxdigit((unsigned char) p[4]) diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c index 644ea6d..ba1e7f0 100644 --- a/src/backend/utils/adt/jsonb.c +++ b/src/backend/utils/adt/jsonb.c @@ -218,6 +218,14 @@ jsonb_from_cstring(char *json, int len) JsonbInState state; JsonSemAction sem; + if (strstr(json,u) != NULL) + ereport(WARNING, +(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE), + errmsg(jsonb cannot represent u; reading as +\\u), + errdetail(Due to an implementation restriction, jsonb cannot represent a unicode null immediately preceded by a backslash.))); + elog(WARNING,); + memset(state, 0, sizeof(state)); memset(sem, 0, sizeof(sem)); lex = makeJsonLexContextCstringLen(json, len, true); @@ -305,7 +313,7 @@ jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal)
Re: [HACKERS] Unsafe coding in ReorderBufferCommit()
Andres Freund and...@2ndquadrant.com writes: On 2015-01-23 16:47:30 -0500, Tom Lane wrote: There are at least two bugs in reorderbuffer.c's ReorderBufferCommit(): Thanks for fixing these! Unfortunately there's more - we'll currently do bad things if transaction commit fails. At the very least the (sub-)transaction begin commands need to be moved out of the exception block as they can fail... :(. E.g. because this is the 2^32-1 subxact or similar... I actually also want to strip the CATCH block of most of it's contents - there's really no need anymore for most of what it does. No objection here. I was just doing a mechanical transform of the function, not based on any deep understanding of what it does. The less you need to do in a CATCH block, the better. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
Andres Freund and...@2ndquadrant.com writes: On 2015-01-25 14:02:47 -0500, Tom Lane wrote: I've been looking for other instances of the problem Mark Wilding pointed out, about missing volatile markers on variables that are modified in PG_TRY blocks and then used in the PG_CATCH stanzas. There definitely are some. Current gcc versions do not warn about that. I think it's actually not a recent regression - in the past a lot of spurious instances of these warnings have been fixed by simply tacking on volatile on variables that didn't actually need it. Yeah, it's not. For years and years I just automatically stuck a volatile on anything gcc 2.95.3 complained about, so that's why there's so many volatiles there now. But I've not done that lately, and comparing what 2.95.3 warns about now with what a modern version says with -Wclobbered, it's clear that it's pretty much the same broken (and perhaps slightly machine-dependent) algorithm :-( This is scary as hell. I intend to go around and manually audit every single PG_TRY in the current source code, but that is obviously not a long-term solution. Anybody have an idea about how we might get trustworthy mechanical detection of this type of situation? Not really, except convincing gcc to fix the inaccurate detection. Given that there've been bugs open about this (IIRC one from you even) for years I'm not holding my breath. I've completed the audit, and there were a total of only five places that need fixes (including the two I already patched over the weekend). It's mostly pretty new code too, which probably explains why we don't already have field reports of problems. Interestingly, plpython seems heavily *over* volatilized. Not sure whether to take some out there for consistency, or just leave it alone. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
On Sun, Jan 25, 2015 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: This is scary as hell. I intend to go around and manually audit every single PG_TRY in the current source code, but that is obviously not a long-term solution. Anybody have an idea about how we might get trustworthy mechanical detection of this type of situation? One idea I've been thinking about for a while is to create some new, safer notation. Suppose we did this: PG_TRY_WITH_CLEANUP(cleanup_function, cleanup_argument); { /* code requiring cleanup */ } PG_END_TRY_WITH_CLEANUP(); Instead of doing anything with sigsetjmp(), this would just push a frame onto a cleanup stack. We would call of those callbacks from innermost to outermost before doing siglongjmp(). With this design, we don't need any volatile-ization. This doesn't work for PG_CATCH() blocks that do not PG_RE_THROW(), but there are not a ton of those. In a quick search, I found initTrie, do_autovacuum, xml_is_document, and a number of instances in various procedural languages. Most instances in the core code could be converted to the style above. Aside from any reduction in the need for volatile, this might actually perform slightly better, because sigsetjmp() is a system call on some platforms. There are probably few cases where that actually matters, but the one in pq_getmessage(), for example, might not be entirely discountable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
On 2015-01-26 10:52:07 -0500, Robert Haas wrote: On Sun, Jan 25, 2015 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote: This is scary as hell. I intend to go around and manually audit every single PG_TRY in the current source code, but that is obviously not a long-term solution. Anybody have an idea about how we might get trustworthy mechanical detection of this type of situation? One idea I've been thinking about for a while is to create some new, safer notation. Suppose we did this: PG_TRY_WITH_CLEANUP(cleanup_function, cleanup_argument); { /* code requiring cleanup */ } PG_END_TRY_WITH_CLEANUP(); That's pretty similar to to PG_ENSURE_ERROR_CLEANUP, except that that is also called after FATAL errors. If we do this, we probably should try to come up with a easier to understand naming scheme. PG_TRY_WITH_CLEANUP vs. PG_ENSURE_ERROR_CLEANUP isn't very clear to a reader. Instead of doing anything with sigsetjmp(), this would just push a frame onto a cleanup stack. We would call of those callbacks from innermost to outermost before doing siglongjmp(). With this design, we don't need any volatile-ization. On the other hand most of the callsites will need some extra state somewhere to keep track of what to undo. That's a bit of restructuring work too. And if the cleanup function needs to reference anything done in the TRY block, that state will need to be volatile too. Aside from any reduction in the need for volatile, this might actually perform slightly better, because sigsetjmp() is a system call on some platforms. There are probably few cases where that actually matters, but the one in pq_getmessage(), for example, might not be entirely discountable. Hm. How would you implement PG_TRY_WITH_CLEANUP without a sigsetjmp()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
Pavel Stehule pavel.steh...@gmail.com writes: 2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: I am thinking, so solution /* if we are doing RAISE, don't report its location */ if (estate-err_text == raise_skip_msg) return; is too simple, and this part should be fixed. This change can be done by on plpgsql or libpq side. This is bug, and it should be fixed. Doing this in libpq is utterly insane. It has not got sufficient context to do anything intelligent. The fact that it's not intelligent is exposed by the regression test changes that the proposed patch causes, most of which do not look like improvements. Another problem is that past requests to change this behavior have generally been to the effect that people wanted *more* context suppressed not less, ie they didn't want any CONTEXT lines at all on certain messages. So the proposed patch seems to me to be going in exactly the wrong direction. The design I thought had been agreed on was to add some new option to plpgsql's RAISE command which would cause suppression of all CONTEXT lines not just the most closely nested one. You could argue about whether the behavior needs to be 100% backwards compatible or not --- if so, perhaps it could be a three-way option all, none, or one line, defaulting to the last for backwards compatibility. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-01-26 16:14 GMT+01:00 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: I am thinking, so solution /* if we are doing RAISE, don't report its location */ if (estate-err_text == raise_skip_msg) return; is too simple, and this part should be fixed. This change can be done by on plpgsql or libpq side. This is bug, and it should be fixed. Doing this in libpq is utterly insane. It has not got sufficient context to do anything intelligent. The fact that it's not intelligent is exposed by the regression test changes that the proposed patch causes, most of which do not look like improvements. I don't understand. There can be a overhead due useless transformation some data to client side. But all what it need - errcontext and errlevel is possible. Another problem is that past requests to change this behavior have generally been to the effect that people wanted *more* context suppressed not less, ie they didn't want any CONTEXT lines at all on certain messages. So the proposed patch seems to me to be going in exactly the wrong direction. The design I thought had been agreed on was to add some new option to plpgsql's RAISE command which would cause suppression of all CONTEXT lines not just the most closely nested one. You could argue about whether the behavior needs to be 100% backwards compatible or not --- if so, perhaps it could be a three-way option all, none, or one line, defaulting to the last for backwards compatibility. I see a problem what should be default behave. When I raise NOTICE, then I don't need (don't would) to see CONTEXT lines, When I raise EXCEPTION, then I usually would to see CONTEXT lines. Cannot be solution? estate-err_text = stmt-elog_level == ERROR ? estate-err_text : raise_skip_msg ; Regards Pavel regards, tom lane
Re: [HACKERS] Additional role attributes superuser review
* Andres Freund (and...@2ndquadrant.com) wrote: On 2015-01-26 13:47:02 -0500, Stephen Frost wrote: Right. We already have a role attribute which allows pg_basebackup (replication). Also, with pg_basebackup / rolreplication, your role is able to read the entire data directory from the server, that's not the case with only rights to run pg_start/stop_backup. In conjunction with enterprise backup solutions and SANs, which offer similar controls where a generally unprivileged user can have a snapshot of the system taken through the SAN interface, you can give users the ability to run ad-hoc backups of the cluster without giving them superuser-level access or replication-level access. I'm sorry if this has already been discussed, but the thread is awfully long already. But what's actually the point of having a separate EXCLUSIVEBACKUP permission? Using it still requires full file system access to the data directory, so the additional permissions granted by replication aren't really relevant. I agree that it's a pretty long thread for what amount to a few relatively straight-forward role attributes (at least, in my view). I don't think the comparison with the SAN snapshot functionality is apt: The SAN solution itself will still run with full data access. Just pressing the button for the snapshot requires less. You're comparing that button to pg_start/stop_backup() - but that doesn't make sense, because it's only useful if somebody actually takes the backup during that time. I'm not following your logic here.. You're right- just pressing the button to take a snapshot can be granted out to a lower-level user using the SAN solution. That snapshot's useless unless the user can first run pg_start_backup though (and subsequently run pg_stop_backup afterwards). Clearly, XLOG archiving has to be set up already, but that would be set up when the system is initially brought online. This capability would be used in conjunction with the SAN snapshot capability, it's not intended to be a comparison to what SANs offer. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Mon, Jan 26, 2015 at 2:10 PM, Peter Geoghegan p...@heroku.com wrote: On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote: Now that these issues are fixed and the buildfarm is green again, I'm going to try re-enabling this optimization on Windows. My working theory is that disabling that categorically was a mis-diagnosis of the real problem, and that now that the issues mentioned above are cleaned up, it'll just work. That might not be right, but I think it's worth a try. Right. We're only supporting the C locale on Windows. How could Windows possibly be broken if locale-related functions like strxfrm() and strcoll() are not even called? That's what I hope to find out. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fix typo in guc.c
On Mon, Jan 26, 2015 at 10:06 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 01/26/2015 02:56 PM, Sawada Masahiko wrote: Hi, Attached patch fixes the typo in guc.c. It's typo, I think. diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index f6df077..f4f1965 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3880,7 +3880,7 @@ build_guc_variables(void) } /* -* Create table with 20% slack +* Create table with 25% slack */ size_vars = num_vars + num_vars / 4; No, I think that's intentional. After the creation, indeed 20% of the table is empty. For example, if num_vars is 100, size_vars is 125. And 100/125 = 0.80. In add_guc_variable, where the table is enlarged, it says: /* * Increase the vector by 25% */ int size_vars = size_guc_variables + size_guc_variables / 4; That's correct too. The table is enlarged by 25%, so after the operation, 20% of it is again free. Subtle ;-) (Although I don't think increase is the correct term here. Should be enlarge, or increase the *size* of the vector by 25%.) Oh I see. Thank you for explain! I think so too, 'enlarged' should be used in here. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: On 1/26/15 1:44 PM, Pavel Stehule wrote: 2015-01-26 13:39 GMT+01:00 Marko Tiikkaja ma...@joh.to: On 1/26/15 1:14 PM, Pavel Stehule wrote: I afraid about some unexpected side effects of your proposal if somebody mix languages - these side effects should not be critical I have no idea what you're talking about. What kind of side effects? what will be a error context if plpgsql calls a plperl function that raises a exception what will be a error context if plperl calls a plpgsql functions that raises a exception I fail to see the point. How would that be different from what happens today? Remember, PL/PgSQL only suppresses the *topmost* stack frame, and only when using RAISE from within a PL/PgSQL function. I had to though little bit more - and I am thinking so we should to return back to start of this thread. Current state: 1. RAISE in plpgsql doesn't show a context - what we want in RAISE NOTICE and we don't want in RAISE EXCEPTION I am thinking, so solution /* if we are doing RAISE, don't report its location */ if (estate-err_text == raise_skip_msg) return; is too simple, and this part should be fixed. This change can be done by on plpgsql or libpq side. This is bug, and it should be fixed. 2. Personally I prefer a little bit conceptual solution, that needs a libpq change because I wish some mode between terse and verbose mode - I would not to see context for NOTICEs, but I would to see context for errors. This request is generic - independent on used PL. @2 is my feature request and it is possible independent on @1. 3. your proposal plpgsql.suppress_simple_error_context fix the @2 partially - just I prefer a generic solution that will be available for all PL - exception processing is same for all PL, so filtering should be common too. Regards Pavel .m
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
On Mon, Jan 26, 2015 at 1:34 PM, Andres Freund and...@2ndquadrant.com wrote: I guess we'd need to tie it into PG_exception_stack levels, so it correctly handles nesting with sigsetjmp locations. In contrast to sigsetjmp() style handlers we can't rely on PG_CATCH cleaning up that state. I was thinking that PG_TRY() would need to squirrel away the value of cleanup_stack and set it to null, and PG_CATCH would need to restore the squirreled-away value. That way we fire handlers in reverse-order-of-registration regardless of which style of registration is used. I wonder how hard it is to make that reliable for errors thrown in a cleanup callback. Those certainly are possible in some of the PG_CATCHes we have right now. I think what should happen is that pg_re_throw() should remove each frame from the stack and then call the associated callback. If another error occurs, we won't try that particular callback again, but we'll try the next one. In most cases this should be impossible anyway because the catch-block is just a variable assignment or something, but not in all cases, of course. And before doing sigsetjmp to the active handler, we run all the functions on the stack. There shouldn't be any need for volatile; the compiler has to know that once we make it possible to get at a pointer to cb_arg via a global variable (cleanup_stack), any function we call in another translation unit could decide to call that function and it would need to see up-to-date values of everything cb_arg points to. So before calling such a function it had better store that data to memory, not just leave it lying around in a register somewhere. Given that we, at the moment at least, throw ERRORs from signal handlers, I don't think that necessarily holds true. I think we're not that far away from getting rid of all of those though. Well, I think the theory behind that is we should only be throwing errors from signal handlers when ImmediateInterruptOK = true, which is only supposed to happen when the code thereby interrupted isn't doing anything interesting. So if you set up some state that can be touched by the error-handling path and then, in the same function, set ImmediateInterruptOK, yeah, that probably needs to be volatile. But if function A sets up the state and then calls function B in another translation unit, and B sets ImmediateInterruptOK, then A has no way of knowing that B wasn't going to just throw a garden-variety error, so it better have left things in a tidy state. We still need to scrutinize the actual functions that set ImmediateInterruptOK and, if they are static, their callers, but that isn't too many places to look at. It's certainly (IMHO) a lot better than trying to stick in volatile qualifiers every place we use a try-catch block, and if you succeed in getting rid of ImmediateInterruptOK, then it goes away altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
* Robert Haas (robertmh...@gmail.com) wrote: On Mon, Jan 26, 2015 at 1:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-26 13:47:02 -0500, Stephen Frost wrote: Right. We already have a role attribute which allows pg_basebackup (replication). Also, with pg_basebackup / rolreplication, your role is able to read the entire data directory from the server, that's not the case with only rights to run pg_start/stop_backup. In conjunction with enterprise backup solutions and SANs, which offer similar controls where a generally unprivileged user can have a snapshot of the system taken through the SAN interface, you can give users the ability to run ad-hoc backups of the cluster without giving them superuser-level access or replication-level access. I'm sorry if this has already been discussed, but the thread is awfully long already. But what's actually the point of having a separate EXCLUSIVEBACKUP permission? Using it still requires full file system access to the data directory, so the additional permissions granted by replication aren't really relevant. That's not necessarily true. You could be able to run a command like san_snapshot $PGDATA without necessarily having the permissions to inspect the contents of the resulting snapshot. Of course somebody should be doing that, but in accord with the principle of least privilege, there's no reason that the account running the unattended backup needs to have those rights. Right! You explained it more clearly than I did. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 3:13 PM, Peter Geoghegan p...@heroku.com wrote: On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: There's a fairly serious readability problem when someone has posted a patch as a subthread of some more general discussion. For example, look at the adaptive ndistinct estimator patch: it's not obvious which attachment is the actual patch, and whether the latest email has anything to do with the patch is entirely arbitrary. I think that the inability to put each message in context, with metadata comments associated with individual messages is a serious regression in functionality. I hope it is fixed soon. I raised this issue at the earliest opportunity, when Magnus privately sought feedback early last year. I agree. Starting a new email thread for each patch version is, IMHO, a complete non-starter. It's 100% contrary to what has generally been advocated as best-practice up until now, and it is basically saying we should alter our workflow because the tool can't cope with the one we've got. The whole point of having home-grown tools for this stuff is that they're supposed to work with the way we already like to do things instead of forcing us to work in new ways. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 9:20 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 26, 2015 at 3:13 PM, Peter Geoghegan p...@heroku.com wrote: On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: There's a fairly serious readability problem when someone has posted a patch as a subthread of some more general discussion. For example, look at the adaptive ndistinct estimator patch: it's not obvious which attachment is the actual patch, and whether the latest email has anything to do with the patch is entirely arbitrary. I think that the inability to put each message in context, with metadata comments associated with individual messages is a serious regression in functionality. I hope it is fixed soon. I raised this issue at the earliest opportunity, when Magnus privately sought feedback early last year. Yes, and the agreement after that feedback was to try it out and then figure out what changes were needed? As about half the feedback said it was better without and half said it was better with. I agree. Starting a new email thread for each patch version is, IMHO, a complete non-starter. It's 100% contrary to what has generally been advocated as best-practice up until now, and it is basically saying we should alter our workflow because the tool can't cope with the one we've got. The whole point of having home-grown tools for this stuff is that they're supposed to work with the way we already like to do things instead of forcing us to work in new ways. Why would you create a new thread for a new *version* of a patch? The whole *point* of the system is that you shouldn't do that, yes, so I'm not sure where you got the idea that you should do that from? I though the issue currently discussed was when posted a *different* patch on the same thread, or that this required the first patch in a thread that used to be about something that was not a patch. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Additional role attributes superuser review
On 2015-01-26 13:47:02 -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: After re-reading through this thread is seems like EXCLUSIVEBACKUP (proposed by Magnus) seemed to be a potentially acceptable alternative. So this would let you do pg_start_backup() and pg_stop_backup(), but it wouldn't let you run pg_basebackup against the server? Right. We already have a role attribute which allows pg_basebackup (replication). Also, with pg_basebackup / rolreplication, your role is able to read the entire data directory from the server, that's not the case with only rights to run pg_start/stop_backup. In conjunction with enterprise backup solutions and SANs, which offer similar controls where a generally unprivileged user can have a snapshot of the system taken through the SAN interface, you can give users the ability to run ad-hoc backups of the cluster without giving them superuser-level access or replication-level access. I'm sorry if this has already been discussed, but the thread is awfully long already. But what's actually the point of having a separate EXCLUSIVEBACKUP permission? Using it still requires full file system access to the data directory, so the additional permissions granted by replication aren't really relevant. I don't think the comparison with the SAN snapshot functionality is apt: The SAN solution itself will still run with full data access. Just pressing the button for the snapshot requires less. You're comparing that button to pg_start/stop_backup() - but that doesn't make sense, because it's only useful if somebody actually takes the backup during that time. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On 2015-01-26 14:05:03 -0500, Stephen Frost wrote: This capability would be used in conjunction with the SAN snapshot capability, it's not intended to be a comparison to what SANs offer. Oh, on a reread that's now clear. Many of those actually allow hooks to be run when taking a snapshot, that'd probably be a better approach. But I can now see the point. Thanks, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
On Mon, Jan 26, 2015 at 1:59 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-26 13:47:02 -0500, Stephen Frost wrote: * Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: After re-reading through this thread is seems like EXCLUSIVEBACKUP (proposed by Magnus) seemed to be a potentially acceptable alternative. So this would let you do pg_start_backup() and pg_stop_backup(), but it wouldn't let you run pg_basebackup against the server? Right. We already have a role attribute which allows pg_basebackup (replication). Also, with pg_basebackup / rolreplication, your role is able to read the entire data directory from the server, that's not the case with only rights to run pg_start/stop_backup. In conjunction with enterprise backup solutions and SANs, which offer similar controls where a generally unprivileged user can have a snapshot of the system taken through the SAN interface, you can give users the ability to run ad-hoc backups of the cluster without giving them superuser-level access or replication-level access. I'm sorry if this has already been discussed, but the thread is awfully long already. But what's actually the point of having a separate EXCLUSIVEBACKUP permission? Using it still requires full file system access to the data directory, so the additional permissions granted by replication aren't really relevant. That's not necessarily true. You could be able to run a command like san_snapshot $PGDATA without necessarily having the permissions to inspect the contents of the resulting snapshot. Of course somebody should be doing that, but in accord with the principle of least privilege, there's no reason that the account running the unattended backup needs to have those rights. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Fri, Jan 23, 2015 at 12:34 PM, Robert Haas robertmh...@gmail.com wrote: In other words, even on systems that don't HAVE_LOCALE_T, we still have to support the default collation and the C collation, and they have to behave differently. There's no way to make that work using only strxfrm(), because nothing gets passed to that function to tell it which of those two things it is supposed to do. Now even if the above were not an issue, for example because we dropped support for systems where !HAVE_LOCALE_T, I think it would be poor form to depend on strxfrm_l() to behave like memcpy() where we could just as easily call memcpy() and be *sure* that it was going to do what we wanted. Much of writing good code is figuring out what could go wrong and then figuring out how to prevent it, and relying on strxfrm_l() would be an unnecessary risk regardless. Given the existence of !HAVE_LOCALE_T systems, it's just plain broken. Now that these issues are fixed and the buildfarm is green again, I'm going to try re-enabling this optimization on Windows. My working theory is that disabling that categorically was a mis-diagnosis of the real problem, and that now that the issues mentioned above are cleaned up, it'll just work. That might not be right, but I think it's worth a try. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote: Now that these issues are fixed and the buildfarm is green again, I'm going to try re-enabling this optimization on Windows. My working theory is that disabling that categorically was a mis-diagnosis of the real problem, and that now that the issues mentioned above are cleaned up, it'll just work. That might not be right, but I think it's worth a try. Right. We're only supporting the C locale on Windows. How could Windows possibly be broken if locale-related functions like strxfrm() and strcoll() are not even called? -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: There's a fairly serious readability problem when someone has posted a patch as a subthread of some more general discussion. For example, look at the adaptive ndistinct estimator patch: it's not obvious which attachment is the actual patch, and whether the latest email has anything to do with the patch is entirely arbitrary. I think that the inability to put each message in context, with metadata comments associated with individual messages is a serious regression in functionality. I hope it is fixed soon. I raised this issue at the earliest opportunity, when Magnus privately sought feedback early last year. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
Aside from any reduction in the need for volatile, this might actually perform slightly better, because sigsetjmp() is a system call on some platforms. There are probably few cases where that actually matters, but the one in pq_getmessage(), for example, might not be entirely discountable. Hm. How would you implement PG_TRY_WITH_CLEANUP without a sigsetjmp()? Posit: struct cleanup_entry { void (*callback)(void *); void *callback_arg; struct cleanup_entry *next; }; cleanup_entry *cleanup_stack = NULL; So PG_TRY_WITH_CLEANUP(_cb, _cb_arg) does (approximately) this: { cleanup_entry e; cleanup_entry *orige; e.callback = (_cb); e.callback_arg = (_cb_arg); e.next = cleanup_stack; orige = cleanup_stack; cleanup_stack = e; And when you PG_END_TRY_WITH_CLEANUP, we just do this: cleanup_stack = orige; } Hm. Not bad. [ponder] I guess we'd need to tie it into PG_exception_stack levels, so it correctly handles nesting with sigsetjmp locations. In contrast to sigsetjmp() style handlers we can't rely on PG_CATCH cleaning up that state. I wonder how hard it is to make that reliable for errors thrown in a cleanup callback. Those certainly are possible in some of the PG_CATCHes we have right now. And before doing sigsetjmp to the active handler, we run all the functions on the stack. There shouldn't be any need for volatile; the compiler has to know that once we make it possible to get at a pointer to cb_arg via a global variable (cleanup_stack), any function we call in another translation unit could decide to call that function and it would need to see up-to-date values of everything cb_arg points to. So before calling such a function it had better store that data to memory, not just leave it lying around in a register somewhere. Given that we, at the moment at least, throw ERRORs from signal handlers, I don't think that necessarily holds true. I think we're not that far away from getting rid of all of those though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Additional role attributes superuser review
* Robert Haas (robertmh...@gmail.com) wrote: On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell adam.brightw...@crunchydatasolutions.com wrote: After re-reading through this thread is seems like EXCLUSIVEBACKUP (proposed by Magnus) seemed to be a potentially acceptable alternative. So this would let you do pg_start_backup() and pg_stop_backup(), but it wouldn't let you run pg_basebackup against the server? Right. We already have a role attribute which allows pg_basebackup (replication). Also, with pg_basebackup / rolreplication, your role is able to read the entire data directory from the server, that's not the case with only rights to run pg_start/stop_backup. In conjunction with enterprise backup solutions and SANs, which offer similar controls where a generally unprivileged user can have a snapshot of the system taken through the SAN interface, you can give users the ability to run ad-hoc backups of the cluster without giving them superuser-level access or replication-level access. Even with simpler solutions, it means that the backup user doesn't have to be able to run some superuser-level script against the database to run the backup. As for pg_basebackup itself, I agree that it's not exactly intuitive that 'replication' is what grants you the right to run pg_basebackup.. Perhaps we could rename it or make an alias for it, or something along those lines? I wasn't looking to do anything with it at this time, but it would probably be good to improve it somehow, if you (or anyone) have suggestions on how best to do so. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote: Well, if it's essentially unusable, then we've reached parity with the old app (yes, you deserved that). No, I didn't. What we had before I wrote that tool was a bunch of wiki pages you put together which were forever having problems with multiple people editing the page at the same time, and not always adhering to the formatting standards, and sometimes accidentally or purposefully deleting things other people had done. The tool was written to mimic that, but without the edit-war chaos. So, if sucked, it sucked because it mimicked something designed by you. Furthermore, if it sucked so bad, why did it take anyone 5 years to get around to rewriting it? It took me less than a year to get around to replacing what you wrote. The difference between the old and new apps is that it's actually *possible* to improve things on the new app, which was never going to happen on the old one. That is probably a fair critique. There's also a significant advantage in knowing that the entire email thread is available on a patch without someone having to remember to manually paste in each message ID, something which failed to happen at least 1/3 of the time for important messages. As far as I can see, the new app offers no real advantage in this regard. The thing that really made things better here is the work that was done to make threading in our archives work properly. In the old app, you could click on the last message for which someone put the message-ID and then go to the end of the thread. In the new app, you can... pretty much do the same thing. It's not like every message in the thread shows up in the app. The latest one with an attachment does, but that's not necessarily the latest patch version. (While I'm complaining, the links only go to the flat version of the thread, while I happen to prefer the version that shows one message at a time with a message-ID selector to switch messages.) What would be cool is a way to flag individual messages in the email thread as being important. Will give it some thought and suggest something. You make that comment as if three people hadn't already +1'd that idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On 01/26/2015 01:29 PM, Robert Haas wrote: Furthermore, if it sucked so bad, why did it take anyone 5 years to get around to rewriting it? It took me less than a year to get around to replacing what you wrote. Because whoever replaced it knew they'd be facing a shitstorm of criticism? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: row_to_array function
On 1/25/15 4:23 AM, Pavel Stehule wrote: I tested a concept iteration over array in format [key1, value1, key2, value2, .. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too It is only a few lines more to current code, and this change doesn't break a compatibility. Do you think, so this patch is acceptable? Ideas, comments? Aside from fixing the comments... I think this needs more tests on corner cases. For example, what happens when you do foreach a, b, c in array(array(1,2),array(3,4)) ? Or the opposite case of foreach a,b in array(array(1,2,3)) Also, what about: foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 9:46 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Jan 26, 2015 at 3:24 PM, Magnus Hagander mag...@hagander.net wrote: Yes, and the agreement after that feedback was to try it out and then figure out what changes were needed? As about half the feedback said it was better without and half said it was better with. Well, I can't speak to anyone else's opinion, but I'm quite sure I raised the issue that we need a way to call out which messages in the thread are important, and I think that's pretty much what Peter is saying, too. I find the new tool essentially unusable - having one link to the whole thread instead of individual links to just the important messages in that thread is a huge regression for me, as is the lack of the most recent activity on the summary page. I don't know how much more feedback than that you need to be convinced, but I'm going to shut up now before I say something I will later regret. According to my mailbox, you didn't even respond on that thread. But it may well be that your email ended up on some other thread and therefor was not included when I went back and looked over all the responses I got on it. If that was the case, then I apologize for loosing track of the feedback. The most recent activity on the summary page is on my short-term todo list to fix. The past couple of days have been a bit too busy to get that done though, mainly due to the upcoming FOSDEM and pgday events. But rest assured that part is definitely on the list, as it doesn't actually change any functionality, it's just a view. Same as that quick stats numbers thing on the frontpage of each cf. As for being able to flag more things on individual emails/patches, I am definitely not against that in principle, if that's what people prefer. But I don't think it's unreasonable to give it a few days and then collect feedback on that (and other things). Which of course also includes rolling back the whole thing if people prefer the older one - that has been an option on the table from the time we decided to give this one a try in the first place. (Though in that case, we really need to find a maintainer for that code, as it's we don't seem to have that now. But I'm sure that can get sorted) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On 1/23/15 2:15 PM, Stephen Frost wrote: I happen to like the idea specifically because it would allow regular roles to change the auditing settings (no need to be a superuser or to be able to modify postgresql.conf/postgresql.auto.conf) Is there really a use case for non-superusers to be able to change auditing config? That seems like a bad idea. What's a bad idea is having every auditor on the system running around as superuser.. When it comes to looking at auditing data, I agree. When it comes to changing auditing settings, I think that needs to be very restrictive. Really, it should be more (or differently) restrictive than SU, so that you can effectively audit your superusers with minimal worries about superusers tampering with auditing. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On 2015-01-26 13:32:51 -0800, Josh Berkus wrote: On 01/26/2015 01:29 PM, Robert Haas wrote: Furthermore, if it sucked so bad, why did it take anyone 5 years to get around to rewriting it? It took me less than a year to get around to replacing what you wrote. Because whoever replaced it knew they'd be facing a shitstorm of criticism? Oh, comeon. Grow up. A missing feature that several people commented on *before* the tool was released, and that several people have commented upon since isn't a shitstorm of criticism. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Fri, Jan 23, 2015 at 1:16 PM, Stephen Frost sfr...@snowman.net wrote: Well, I'm still of the view that there's little to lose by having this remain out-of-core for a release or three. We don't really all agree on what we want, and non-core code can evolve a lot faster than core code, so what's the rush? Being out of core makes it unusable in production environments for a large number of organizations. :/ Tough. Once we accept something into core, we're stuck maintaining it forever. We shouldn't do that unless we're absolutely confident the design is solid. We are not close to having consensus on this. If somebody has a reason for accepting only core PostgreSQL and not add-ons, it's either a stupid reason, or it's because they believe that the core stuff will be better thought-out than the add-ons. If it's the latter, we shouldn't disappoint. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com wrote: Well, I can't speak to anyone else's opinion, but I'm quite sure I raised the issue that we need a way to call out which messages in the thread are important, and I think that's pretty much what Peter is saying, too. It is. I find the new tool essentially unusable - having one link to the whole thread instead of individual links to just the important messages in that thread is a huge regression for me, as is the lack of the most recent activity on the summary page. Yes, the lack of the most recent activity is also a major omission. I like some things about the new app. The general idea of having the mailing list traffic drive the commitfest app is a good one. However, it seems we've gone too far. I strongly feel that the two regressions called out here are big problems. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 4:07 PM, Magnus Hagander mag...@hagander.net wrote: I assume what was referred to was that the old cf app would show the last 3 (I think it was) comments/patches/whatnot on a patch on the summary page (and then clickthrough for more details). Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On 01/26/2015 12:46 PM, Robert Haas wrote: I find the new tool essentially unusable - having one link to the whole thread instead of individual links to just the important messages in that thread is a huge regression for me, as is the lack of the most recent activity on the summary page. Well, if it's essentially unusable, then we've reached parity with the old app (yes, you deserved that). The difference between the old and new apps is that it's actually *possible* to improve things on the new app, which was never going to happen on the old one. There's also a significant advantage in knowing that the entire email thread is available on a patch without someone having to remember to manually paste in each message ID, something which failed to happen at least 1/3 of the time for important messages. What would be cool is a way to flag individual messages in the email thread as being important. Will give it some thought and suggest something. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 4:01 PM, Magnus Hagander mag...@hagander.net wrote: According to my mailbox, you didn't even respond on that thread. But it may well be that your email ended up on some other thread and therefor was not included when I went back and looked over all the responses I got on it. If that was the case, then I apologize for loosing track of the feedback. I remember bringing it up at PGCon, I think. I don't know whether I wrote it in an email. The most recent activity on the summary page is on my short-term todo list to fix. The past couple of days have been a bit too busy to get that done though, mainly due to the upcoming FOSDEM and pgday events. But rest assured that part is definitely on the list, as it doesn't actually change any functionality, it's just a view. Same as that quick stats numbers thing on the frontpage of each cf. OK. What I'd like to understand is why this new app had to be rolled out before these things were done. We've been using my app for 5 years and it, like, worked. So why the rush to roll it out with these known issues unfixed? I mean, it's not like you couldn't look at any time and see what the new app lacked that the old app had. The last time you presented this app for feedback, which I remember to be PGCon, it was so buggy that there was no point in trying to form any considered opinion of it. Now, it's rolled out, but with a bunch of stuff that people use and rely on missing. I grant that some of those things you may not have realized anyone cared about, but it feels to me like this got pushed into production without really getting it to feature parity. I could've spent more of my time complaining about that than I did, but why should I have to do that? As for being able to flag more things on individual emails/patches, I am definitely not against that in principle, if that's what people prefer. But I don't think it's unreasonable to give it a few days and then collect feedback on that (and other things). Suit yourself. Which of course also includes rolling back the whole thing if people prefer the older one - that has been an option on the table from the time we decided to give this one a try in the first place. (Though in that case, we really need to find a maintainer for that code, as it's we don't seem to have that now. But I'm sure that can get sorted) I understand that having someone who has the time to maintain the code is an important issue, and I admit I don't, and haven't for a while. There is a lot of sense in replacing the app with something that uses the same software framework as the rest of our infrastructure, which I understand that yours does. And it's not like what the new one does is horribly different from what the old one did. So I don't see that there's any reason we can't make the new one work. But I confess to having no patience with the idea that I have to build a consensus to get you to re-add features that you removed. You can take that position, and since you are the tool maintainer it's not exactly unfair, but since I was willing to put in the work to add those features in the first place, I probably think they were worth having. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
Hi, dbase_redo does: if (InHotStandby) { /* * Lock database while we resolve conflicts to ensure that * InitPostgres() cannot fully re-execute concurrently. This * avoids backends re-connecting automatically to same database, * which can happen in some cases. */ LockSharedObjectForSession(DatabaseRelationId, xlrec-db_id, 0, AccessExclusiveLock); ResolveRecoveryConflictWithDatabase(xlrec-db_id); } Unfortunately that Assert()s when there's a lock conflict because e.g. another backend is currently connecting. That's because ProcSleep() does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and there's no deadlock timeout (or lock timeout) handler registered. I'm not sure if this is broken since 8bfd1a884 introducing those session locks, or if it's caused by the new timeout infrastructure (f34c68f09f34c68f09). I guess the easiest way to fix this would be to make this a loop like ResolveRecoveryConfictWithLock(): LOCKTAG tag; SET_LOCKTAG_OBJECT(tag, InvalidOid, DatabaseRelationId, xlrec-db_id, objsubid); while (!lock_acquired) { while (CountDBBackends(dbid) 0) { CancelDBBackends(dbid, PROCSIG_RECOVERY_CONFLICT_DATABASE, true); /* * Wait awhile for them to die so that we avoid flooding an * unresponsive backend when system is heavily loaded. */ pg_usleep(1); } if (LockAcquireExtended(locktag, AccessExclusiveLock, true, true, false) != LOCKACQUIRE_NOT_AVAIL) lock_acquired = true; } afaics, that should work? Not pretty, but probably easier than starting to reason about the deadlock detector in the startup process. We probably should also add a Assert(!InRecovery || sessionLock) to LockAcquireExtended() - these kind of problems are otherwise hard to find in a developer setting. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))
On Mon, Jan 26, 2015 at 8:43 AM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Another spinoff from the abbreviation discussion. Peter Geoghegan suggested on IRC that numeric would benefit from abbreviation, and indeed it does (in some cases by a factor of about 6-7x or more, because numeric comparison is no speed demon). Cool. What I find particularly interesting about this patch is that it makes sorting numerics significantly faster than even sorting float8 values, at least some of the time, even though the latter has generic SortSupport (for fmgr elision). Example: postgres=# create table foo as select x::float8 x, x::numeric y from (select random() * 1000 x from generate_series(1,100) a) b; SELECT 100 This query takes about 525ms after repeated executions: select * from (select * from foo order by x offset 10) i; However, this query takes about 412ms: select * from (select * from foo order by y offset 10) i; There is probably a good case to be made for float8 abbreviation supportjust as well that your datum abbreviation patch doesn't imply that pass-by-value types cannot be abbreviated across the board (it only implies that abbreviation of pass-by-value types is not supported in the datum sort case).:-) Anyway, the second query above (the one with the numeric ORDER BY column) is enormously faster than the same query executed against master's tip. That takes about 1720ms following repeated executions. So at least that case is over 4x faster, suggesting that abbreviation support for numeric is well worthwhile. So I'm signed up to review this one too. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On 1/23/15 10:16 PM, Amit Kapila wrote: Further, if we want to just get the benefit of parallel I/O, then I think we can get that by parallelising partition scan where different table partitions reside on different disk partitions, however that is a matter of separate patch. I don't think we even have to go that far. My experience with Postgres is that it is *very* sensitive to IO latency (not bandwidth). I believe this is the case because complex queries tend to interleave CPU intensive code in-between IO requests. So we see this pattern: Wait 5ms on IO Compute for a few ms Wait 5ms on IO Compute for a few ms ... We blindly assume that the kernel will magically do read-ahead for us, but I've never seen that work so great. It certainly falls apart on something like an index scan. If we could instead do this: Wait for first IO, issue second IO request Compute Already have second IO request, issue third ... We'd be a lot less sensitive to IO latency. I wonder what kind of gains we would see if every SeqScan in a query spawned a worker just to read tuples and shove them in a queue (or shove a pointer to a buffer in the queue). Similarly, have IndexScans have one worker reading the index and another worker taking index tuples and reading heap tuples... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: searching in array function - array_position
On 1/24/15 2:48 AM, Pavel Stehule wrote: with array_offsets - returns a array of offsets + entryreturns a offset of first occurrence of some element in a array. It uses should be + entryreturns the offset of the first occurrence of some element in an array. It uses + entryreturns a array of offset of all occurrences some element in a array. It uses should be + entryreturns an array of the offsets of all occurrences of some element in an array. It uses Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code? You should remove the array_length() from the last array_offsets test; I don't see that it buys anything. I think there should be tests for what happens when you feed these functions a multi-dimensional array. Other than that, looks good. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/23/15 12:40 PM, Stephen Frost wrote: That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww. The race condition is a problem for pg_start/stop_backup and friends. In this instance, everything will be shut down when the rsync is running, so there isn't a timestamp race condition to worry about. Yeah, I'm more concerned about people that use rsync to take base backups. Do we need to explicitly advise against that? Is there a way to work around this with a sleep after pg_start_backup to make sure all timestamps must be different? (Admittedly I haven't fully wrapped my head around this yet.) How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom command file that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work... Yeah, I had suggested that to Bruce also, but it's not clear why that would be any different from an rsync --size-only in the end, presuming everything went according to plan. Yeah, if everything is shut down maybe we're OK. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)
On 1/25/15 7:42 PM, Amit Langote wrote: On 21-01-2015 PM 07:26, Amit Langote wrote: Ok, I will limit myself to focusing on following things at the moment: * Provide syntax in CREATE TABLE to declare partition key While working on this, I stumbled upon the question of how we deal with any index definitions following from constraints defined in a CREATE statement. I think we do not want to have a physical index created for a table that is partitioned (in other words, has no heap of itself). As the current mechanisms dictate, constraints like PRIMARY KEY, UNIQUE, EXCLUSION CONSTRAINT are enforced as indexes. It seems there are really two decisions to make here: 1) how do we deal with any index definitions (either explicit or implicit following from constraints defined on it) - do we allow them by marking them specially, say, in pg_index, as being mere placeholders/templates or invent some other mechanism? 2) As a short-term solution, do we simply reject creating any indexes (/any constraints that require them) on a table whose definition also includes PARTITION ON clause? Instead define them on its partitions (or any relations in hierarchy that are not further partitioned). Or maybe I'm missing something... Wasn't the idea that the parent table in a partitioned table wouldn't actually have a heap of it's own? If there's no heap there can't be an index. That said, I think this is premature optimization that could be done later. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
On 1/26/15 9:46 AM, Pavel Stehule wrote: The design I thought had been agreed on was to add some new option to plpgsql's RAISE command which would cause suppression of all CONTEXT lines not just the most closely nested one. You could argue about whether the behavior needs to be 100% backwards compatible or not --- if so, perhaps it could be a three-way option all, none, or one line, defaulting to the last for backwards compatibility. I see a problem what should be default behave. When I raise NOTICE, then I don't need (don't would) to see CONTEXT lines, When I raise EXCEPTION, then I usually would to see CONTEXT lines. FWIW, that's the case I almost always run into: I turn on some debugging which means I know where the RAISE is coming from, but now I'm flooded with CONTEXT lines. You could do that with an option to RAISE, but that seems like a lot of extra coding work for little gain. Perhaps it'd be worth creating client_min_context and log_min_context GUCs... Another option that I think would work well is that you only provide context for the first call within a block of code. For plpgsql that would be a function, but maybe it'd be better to just do this per-subtransaction. I do agree that this needs to work across the board, not just for plpgsql. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
Hi, On 2015-01-22 19:56:07 +0100, Andres Freund wrote: Hi, On 2015-01-20 16:28:19 +0100, Andres Freund wrote: I'm analyzing a problem in which a customer had a pg_basebackup (from standby) created 9.2 cluster that failed with WAL contains references to invalid pages. The failed record was a xlog redo visible i.e. XLOG_HEAP2_VISIBLE. First I thought there might be another bug along the line of 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't seem to have any problems. Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when the basebackup was started and finished *before* pg_basebackup finished. movedb() basically works in these steps: 1) lock out users of the database 2) RequestCheckpoint(IMMEDIATE|WAIT) 3) DropDatabaseBuffers() 4) copydir() 5) XLogInsert(XLOG_DBASE_CREATE) 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE) 7) rmtree(src_dbpath) 8) XLogInsert(XLOG_DBASE_DROP) 9) unlock database If a basebackup starts while 4) is in progress and continues until 7) happens I think a pretty wide race opens: The basebackup can end up with a partial copy of the database in the old tablespace because the rmtree(old_path) concurrently was in progress. Normally such races are fixed during replay. But in this case, the replay of the XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);. fixing nothing. Besides making AD .. ST use sane WAL logging, which doesn't seem backpatchable, I don't see what could be done against this except somehow making basebackups fail if a AD .. ST is in progress. Which doesn't look entirely trivial either. I basically have two ideas to fix this. 1) Make do_pg_start_backup() acquire a SHARE lock on pg_database. That'll prevent it from starting while a movedb() is still in progress. Then additionally add pg_backup_in_progress() function to xlog.c that checks (XLogCtl-Insert.exclusiveBackup || XLogCtl-Insert.nonExclusiveBackups != 0). Use that in createdb() and movedb() to error out if a backup is in progress. Attached is a patch trying to this. Doesn't look too bad and lead me to discover missing recovery conflicts during a AD ST. But: It doesn't actually work on standbys, because lock.c prevents any stronger lock than RowExclusive from being acquired. And we need need a lock that can conflict with WAL replay of DBASE_CREATE, to handle base backups that are executed on the primary. Those obviously can't detect whether any standby is currently doing a base backup... I currently don't have a good idea how to mangle lock.c to allow this. I've played with doing it like in the second patch, but that doesn't actually work because of some asserts around ProcSleep - leading to locks on database objects not working in the startup process (despite already being used). The easiest thing would be to just use a lwlock instead of a heavyweight lock - but those aren't canceleable... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On 2015-01-26 12:54:04 -0800, Peter Geoghegan wrote: On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com wrote: Well, I can't speak to anyone else's opinion, but I'm quite sure I raised the issue that we need a way to call out which messages in the thread are important, and I think that's pretty much what Peter is saying, too. It is. Agreed. I find the new tool essentially unusable - having one link to the whole thread instead of individual links to just the important messages in that thread is a huge regression for me, as is the lack of the most recent activity on the summary page. Yes, the lack of the most recent activity is also a major omission. That one is there now though, isn't it? For specific CF: https://commitfest.postgresql.org/3/activity/ All CFs: https://commitfest.postgresql.org/activity/ Or do you mean something else? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 10:05 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-26 12:54:04 -0800, Peter Geoghegan wrote: On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com wrote: Well, I can't speak to anyone else's opinion, but I'm quite sure I raised the issue that we need a way to call out which messages in the thread are important, and I think that's pretty much what Peter is saying, too. It is. Agreed. I find the new tool essentially unusable - having one link to the whole thread instead of individual links to just the important messages in that thread is a huge regression for me, as is the lack of the most recent activity on the summary page. Yes, the lack of the most recent activity is also a major omission. That one is there now though, isn't it? For specific CF: https://commitfest.postgresql.org/3/activity/ All CFs: https://commitfest.postgresql.org/activity/ Or do you mean something else? I assume what was referred to was that the old cf app would show the last 3 (I think it was) comments/patches/whatnot on a patch on the summary page (and then clickthrough for more details). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] proposal: plpgsql - Assert statement
2015-01-26 22:34 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/22/15 2:01 PM, Pavel Stehule wrote: * I would to simplify a behave of evaluating of message expression - probably I disallow NULL there. Well, the only thing I could see you doing there is throwing a different error if the hint is null. I don't see that as an improvement. I'd just leave it as-is. I enabled a NULL - but enforced a WARNING before. I don't see the separate warning as being helpful. I'd just do something like +(err_hint != NULL) ? errhint(%s, err_hint) : errhint(Message attached to failed assertion is null) )); ok There should also be a test case for a NULL message. * GUC enable_asserts will be supported That would be good. Would that allow for enabling/disabling on a per-function basis too? sure - there is only question if we develop a #option enable|disable_asserts. I have no string idea. The option would be nice, but I don't think it's strictly necessary. The big thing is being able to control this on a per-function basis (which I think you can do with ALTER FUNCTION SET?) you can do it without any change now -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 4:32 PM, Josh Berkus j...@agliodbs.com wrote: On 01/26/2015 01:29 PM, Robert Haas wrote: Furthermore, if it sucked so bad, why did it take anyone 5 years to get around to rewriting it? It took me less than a year to get around to replacing what you wrote. Because whoever replaced it knew they'd be facing a shitstorm of criticism? Didn't stop me. And actually, I didn't face a shitstorm of criticism. The way I remember it, I got a pretty much unqualified positive reaction at the time. Only later, when you had conveniently forgotten how bad things were before we had that tool, did you start routinely dumping on it. And, AFAICS, not because of anything it did wrong, only because of things that it didn't do that were features you wanted to have. Most of those features were things that I could not possibly have implemented anyway because they would have required deeper integration with the authentication system than was possible with the access my app had. Automatically linking new emails? Not possible: I had no archives access. Automatic emails to users? Not possible: no access to email addresses. Place to put a link to a git branch? Yeah, I could have done that, and just didn't, but the new app doesn't do it either, yet anyway. I don't know how Magnus's app is connecting in to the rest of the PG infrastructure, but it's obviously got more access than mine had. And I think that's great, because hopefully it will eventually make this much nicer than what I had. It isn't nicer yet, though, and you showing up and tossing out insults based on revisionist history won't fix that. What particularly peeves me about your remarks is that you've basically made no contribution to the CommitFest process in years. I am less active than I used to be, but I still do a lot of reviewing and committing of patches and generally put a pretty significant amount of time into it. Despite whatever shortcomings that app I wrote has, so do a lot of other people. The CommitFest process has got its issues, particularly a lack of reviewers, but it is still better than having no process, and yet your only contributions seem to be to denigrate the people who are putting time into it. You're using the poor quality of my app, and an almost total misinterpretation of what was said about attracting new reviewers at PGCon, as excuses for your non-participation, and that is certainly your prerogative. You have as much right not to participate as anyone. But refusing to participate except to throw bricks is not going to move this project forward. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Mon, Jan 26, 2015 at 8:04 PM, Andres Freund and...@2ndquadrant.com wrote: On 2015-01-26 19:58:25 -0500, Bruce Momjian wrote: On Tue, Jan 27, 2015 at 01:43:41AM +0100, Andres Freund wrote: master + 32align.patch: -c max_connections=400 tps = 183791.872359 (high run vs. run variability) -c max_connections=401 tps = 185494.98244 (high run vs. run variability) master + 64align.patch: -c max_connections=400 tps = 489257.195570 -c max_connections=401 tps = 490496.520632 Pretty much as expected, rigth? Yes, I am convinced. Let's work on a patch now. Since I can reproduce some minor (1-3%) performance *regressions* at low client counts when aligning every shmem allocation, I'm inclined to just add special case code to BufferShmemSize()/InitBufferPool() to align descriptors to PG_CACHE_LINE_SIZE. That's really unlikely to regress anythign as it basically can't be a bad idea to align buffer descriptors. Contrary opinions? Robert? I'm totally OK with further aligning just that one allocation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainModifyTarget doesn't work as expected
On 2015/01/27 9:15, Jim Nasby wrote: On 12/22/14 12:50 AM, Etsuro Fujita wrote: I think ExplainModifyTarget should show the parent of the inheritance tree in multi-target-table cases, as described there, but noticed that it doesn't always work like that. Here is an example. Anything ever happen with this? Nothing. I'll add this to the next CF. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Mon, Jun 30, 2014 at 5:06 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote: I think having two columns would work. The columns could be called database and database_list and user and user_list respectively. The database column may contain one of all, sameuser, samegroup, replication, but if it's empty, database_list will contain an array of database names. Then (all, {}) and (, {all}) are easily separated. Likewise for user and user_list. Thanks for the review. I corrected all the review comments except the one to add two columns as (database, database_list and user, user_list). I feel this may cause some confusion to the users. Here I attached the latest version of the patch. I will add this patch to the next commitfest. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On Mon, Jan 26, 2015 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote: Contrary opinions? Robert? I'm totally OK with further aligning just that one allocation. Of course, now that I think about it, aligning it probably works mostly because the size is almost exactly one cache line. If it were any bigger or smaller, aligning it more wouldn't help. So maybe we should also do something like what LWLocks do, and make a union between the actual structure and an appropriate array of padding bytes - say either 64 or 128 of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: dynahash replacement for buffer table
This developed a slight merge conflict. I've rebased the attached version, and I also took the step of getting rid of buf_table.c, as I think I proposed somewhere upthread. This avoids the overhead of constructing a BufferTag only to copy it into a BufferLookupEnt, plus some function calls and so forth. A quick-and-dirty test suggests this might not have cut down on the 1-client overhead much, but I think it's worth doing anyway: it's certainly saving a few cycles, and I don't think it's complicating anything measurably. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/contrib/Makefile b/contrib/Makefile index 195d447..141ef0a 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -19,6 +19,7 @@ SUBDIRS = \ earthdistance \ file_fdw \ fuzzystrmatch \ + hashtest \ hstore \ intagg \ intarray \ diff --git a/contrib/hashtest/Makefile b/contrib/hashtest/Makefile new file mode 100644 index 000..3ee42f8 --- /dev/null +++ b/contrib/hashtest/Makefile @@ -0,0 +1,18 @@ +# contrib/hashtest/Makefile + +MODULE_big = hashtest +OBJS = hashtest.o + +EXTENSION = hashtest +DATA = hashtest--1.0.sql + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/hashtest +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/hashtest/hashtest--1.0.sql b/contrib/hashtest/hashtest--1.0.sql new file mode 100644 index 000..e271baf --- /dev/null +++ b/contrib/hashtest/hashtest--1.0.sql @@ -0,0 +1,52 @@ +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use CREATE EXTENSION hashtest to load this file. \quit + +CREATE FUNCTION chash_insert_test() +RETURNS void +AS 'MODULE_PATHNAME', 'chash_insert_test' +LANGUAGE C; + +CREATE FUNCTION chash_search_test() +RETURNS void +AS 'MODULE_PATHNAME', 'chash_search_test' +LANGUAGE C; + +CREATE FUNCTION chash_delete_test() +RETURNS void +AS 'MODULE_PATHNAME', 'chash_delete_test' +LANGUAGE C; + +CREATE FUNCTION chash_concurrent_test() +RETURNS void +AS 'MODULE_PATHNAME', 'chash_concurrent_test' +LANGUAGE C; + +CREATE FUNCTION chash_collision_test() +RETURNS void +AS 'MODULE_PATHNAME', 'chash_collision_test' +LANGUAGE C; + +CREATE FUNCTION dynahash_insert_test() +RETURNS void +AS 'MODULE_PATHNAME', 'dynahash_insert_test' +LANGUAGE C; + +CREATE FUNCTION dynahash_search_test() +RETURNS void +AS 'MODULE_PATHNAME', 'dynahash_search_test' +LANGUAGE C; + +CREATE FUNCTION dynahash_delete_test() +RETURNS void +AS 'MODULE_PATHNAME', 'dynahash_delete_test' +LANGUAGE C; + +CREATE FUNCTION dynahash_concurrent_test() +RETURNS void +AS 'MODULE_PATHNAME', 'dynahash_concurrent_test' +LANGUAGE C; + +CREATE FUNCTION dynahash_collision_test() +RETURNS void +AS 'MODULE_PATHNAME', 'dynahash_collision_test' +LANGUAGE C; diff --git a/contrib/hashtest/hashtest.c b/contrib/hashtest/hashtest.c new file mode 100644 index 000..172a5bb --- /dev/null +++ b/contrib/hashtest/hashtest.c @@ -0,0 +1,527 @@ +/*- + * hashtest.c + *- + */ + +#include postgres.h + +#include funcapi.h +#include libpq/auth.h +#include lib/stringinfo.h +#include miscadmin.h +#include portability/instr_time.h +#include storage/ipc.h +#include utils/chash.h + +PG_MODULE_MAGIC; + +void _PG_init(void); +Datum chash_insert_test(PG_FUNCTION_ARGS); +Datum chash_search_test(PG_FUNCTION_ARGS); +Datum chash_delete_test(PG_FUNCTION_ARGS); +Datum chash_concurrent_test(PG_FUNCTION_ARGS); +Datum chash_collision_test(PG_FUNCTION_ARGS); +Datum dynahash_insert_test(PG_FUNCTION_ARGS); +Datum dynahash_search_test(PG_FUNCTION_ARGS); +Datum dynahash_delete_test(PG_FUNCTION_ARGS); +Datum dynahash_concurrent_test(PG_FUNCTION_ARGS); +Datum dynahash_collision_test(PG_FUNCTION_ARGS); +static void hashtest_shmem_startup(void); + +PG_FUNCTION_INFO_V1(chash_insert_test); +PG_FUNCTION_INFO_V1(chash_search_test); +PG_FUNCTION_INFO_V1(chash_delete_test); +PG_FUNCTION_INFO_V1(chash_concurrent_test); +PG_FUNCTION_INFO_V1(chash_collision_test); +PG_FUNCTION_INFO_V1(dynahash_insert_test); +PG_FUNCTION_INFO_V1(dynahash_search_test); +PG_FUNCTION_INFO_V1(dynahash_delete_test); +PG_FUNCTION_INFO_V1(dynahash_concurrent_test); +PG_FUNCTION_INFO_V1(dynahash_collision_test); + +typedef struct +{ + uint32 key; + uint32 val; +} hentry; + +static CHashDescriptor cdesc = { + hashtest-chash, /* name */ + 1048576, /* capacity */ + sizeof(hentry), /* element size */ + sizeof(uint32) /* key size */ +}; + +#define DYNAHASH_PARTITIONS 16 + +static shmem_startup_hook_type prev_shmem_startup_hook = NULL; +static CHashTable chash; +static HTAB *dynahash; +static LWLockId dynahash_lock[DYNAHASH_PARTITIONS]; +static ClientAuthentication_hook_type original_client_auth_hook = NULL; + +static
Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up
On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund and...@2ndquadrant.com wrote: Unfortunately that Assert()s when there's a lock conflict because e.g. another backend is currently connecting. That's because ProcSleep() does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and there's no deadlock timeout (or lock timeout) handler registered. Yes, that could logically happen if there is a lock conflicting as RowExclusiveLock or lower lock can be taken in recovery. [...] afaics, that should work? Not pretty, but probably easier than starting to reason about the deadlock detector in the startup process. Wouldn't it be cleaner to simply register a dedicated handler in StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as it is reserved for backend operations? For back-branches, we may even consider using DEADLOCK_TIMEOUT.. We probably should also add a Assert(!InRecovery || sessionLock) to LockAcquireExtended() - these kind of problems are otherwise hard to find in a developer setting. So this means that locks other than session ones cannot be taken while a node is in recovery, but RowExclusiveLock can be taken while in recovery. Don't we have a problem with this assertion then? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-26 22:03:03 +0100, Andres Freund wrote: Attached is a patch trying to this. Doesn't look too bad and lead me to discover missing recovery conflicts during a AD ST. But: It doesn't actually work on standbys, because lock.c prevents any stronger lock than RowExclusive from being acquired. And we need need a lock that can conflict with WAL replay of DBASE_CREATE, to handle base backups that are executed on the primary. Those obviously can't detect whether any standby is currently doing a base backup... I currently don't have a good idea how to mangle lock.c to allow this. I've played with doing it like in the second patch, but that doesn't actually work because of some asserts around ProcSleep - leading to locks on database objects not working in the startup process (despite already being used). The easiest thing would be to just use a lwlock instead of a heavyweight lock - but those aren't canceleable... As Stephen noticed on irc I forgot to attach those. Caused by the generic HS issue mentioned in 20150126212458.ga29...@awork2.anarazel.de. Attached now. As mentioned above, this isn't really isn't ready (e.g. it'll Assert() on a standby due to the HS issue). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From e5d46c73955e7647912c2625e31027a6fef7c880 Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Mon, 26 Jan 2015 21:03:35 +0100 Subject: [PATCH] Prevent ALTER DATABASE SET TABLESPACE from running concurrently with a base backup. The combination can cause a corrupt base backup. --- src/backend/access/transam/xlog.c| 33 + src/backend/commands/dbcommands.c| 41 src/backend/replication/basebackup.c | 11 ++ src/include/access/xlog.h| 1 + 4 files changed, 86 insertions(+) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 629a457..1daa10d 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -53,6 +53,7 @@ #include storage/ipc.h #include storage/large_object.h #include storage/latch.h +#include storage/lmgr.h #include storage/pmsignal.h #include storage/predicate.h #include storage/proc.h @@ -9329,6 +9330,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, else XLogCtl-Insert.nonExclusiveBackups++; XLogCtl-Insert.forcePageWrites = true; + + /* + * Acquire lock on pg datababase just before releasing the WAL insert lock + * and entering the ENSURE_ERROR_CLEANUP block. That way it's sufficient + * to release it in the error cleanup callback. + */ + LockRelationOid(DatabaseRelationId, ShareLock); + WALInsertLockRelease(); /* Ensure we release forcePageWrites if fail below */ @@ -9523,6 +9532,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, } PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive)); + UnlockRelationOid(DatabaseRelationId, ShareLock); + /* * We're done. As a convenience, return the starting WAL location. */ @@ -9537,6 +9548,8 @@ pg_start_backup_callback(int code, Datum arg) { bool exclusive = DatumGetBool(arg); + UnlockRelationOid(DatabaseRelationId, ShareLock); + /* Update backup counters and forcePageWrites on failure */ WALInsertLockAcquireExclusive(); if (exclusive) @@ -9937,6 +9950,26 @@ do_pg_abort_backup(void) } /* + * Is a (exclusive or nonexclusive) base backup running? + * + * Note that this does not check whether any standby of this node has a + * basebackup running, or whether any upstream master (if this is a standby) + * has one in progress + */ +bool +LocalBaseBackupInProgress(void) +{ + bool ret; + + WALInsertLockAcquire(); + ret = XLogCtl-Insert.exclusiveBackup || + XLogCtl-Insert.nonExclusiveBackups 0; + WALInsertLockRelease(); + + return ret; +} + +/* * Get latest redo apply position. * * Exported to allow WALReceiver to read the pointer directly. diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 5e66961..6184c83 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1087,6 +1087,30 @@ movedb(const char *dbname, const char *tblspcname) errmsg(cannot change the tablespace of the currently open database))); /* + * Prevent SET TABLESPACE from running concurrently with a base + * backup. Without that check a base backup would potentially copy a + * partially removed source database; which WAL replay then would copy + * over the new database... + * + * Starting a base backup takes a SHARE lock on pg_database. In addition a + * streaming basebackup takes the same lock for the entirety of the copy + * of the data directory. That, combined with this check, prevents base + * backups from being taken at the same
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 January 2015 21:10, Alvaro Herrera Wrote, In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) Thanks for your effort, I will take it up for next commitfest.. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch
On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas robertmh...@gmail.com wrote: That's what I hope to find out. :-) Buildfarm seems happy now. I just gave a try to that on one of my small Windows VMs and compared the performance with 9.4 for this simple test case when building with MSVC 2010: create table aa as select random()::text as a, 'filler filler filler' as b a from generate_series(1,100); create index aai in aa(a): On 9.4, the index creation took 26.5s, while on master it took 18s. That's nice, particularly for things like a restore from a dump. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
On Tue, Jan 27, 2015 at 3:18 AM, Jim Nasby jim.na...@bluetreble.com wrote: On 1/23/15 10:16 PM, Amit Kapila wrote: Further, if we want to just get the benefit of parallel I/O, then I think we can get that by parallelising partition scan where different table partitions reside on different disk partitions, however that is a matter of separate patch. I don't think we even have to go that far. We'd be a lot less sensitive to IO latency. I wonder what kind of gains we would see if every SeqScan in a query spawned a worker just to read tuples and shove them in a queue (or shove a pointer to a buffer in the queue). Here IIUC, you want to say that just get the read done by one parallel worker and then all expression calculation (evaluation of qualification and target list) in the main backend, it seems to me that by doing it that way, the benefit of parallelisation will be lost due to tuple communication overhead (may be the overhead is less if we just pass a pointer to buffer but that will have another kind of problems like holding buffer pins for a longer period of time). I could see the advantage of testing on lines as suggested by Tom Lane, but that seems to be not directly related to what we want to achieve by this patch (parallel seq scan) or if you think otherwise then let me know? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 23 January 2015 23:55, Alvaro Herrera, -j1 is now the same as not specifying anything, and vacuum_one_database uses more common code in the parallel and not-parallel cases: the not- parallel case is just the parallel case with a single connection, so the setup and shutdown is mostly the same in both cases. I pushed the result. Please test, particularly on Windows. If you can use configure --enable-tap-tests and run them (make check in the src/bin/scripts subdir) that would be good too .. not sure whether that's expected to work on Windows. I have tested in windows, its working fine, Not sure how to enable tap test in windows, I will check it and run if possible. Thanks, Dilip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates
On 01/23/2015 02:58 AM, Petr Jelinek wrote: On 23/01/15 00:40, Andreas Karlsson wrote: - Renamed some things from int12 to int128, there are still some places with int16 which I am not sure what to do with. I'd vote for renaming them to int128 too, there is enough C functions that user int16 for 16bit integer that this is going to be confusing otherwise. Do you also think the SQL functions should be named numeric_int128_sum, numeric_int128_avg, etc? -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb, unicode escapes and escaped backslashes
On Mon, Jan 26, 2015 at 09:20:54AM -0500, Andrew Dunstan wrote: On 01/23/2015 02:18 AM, Noah Misch wrote: On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote: We could probably fix this fairly easily for non- U+ cases by having jsonb_to_cstring use a different escape_json routine. Maybe we should detect such input and emit a warning of ambiguity? It's likely to be rare enough, but clearly not as rare as we'd like, since this is a report from the field. Perhaps. Something like WARNING: jsonb cannot represent \\u; reading as \u? Alas, but I do prefer that to silent data corruption. Maybe something like this patch. I have two concerns about it, though. The first is the possible performance impact of looking for the offending string in every jsonb input, and the second is that the test isn't quite right, since input of \\\u doesn't raise this issue - i.e. the problem arises when u is preceded by an even number of backslashes. I share your second concern. It is important that this warning not cry wolf; it should never fire for an input that is actually unaffected. For the moment, maybe I could commit the fix for the non U+ case for escape_json, and we could think some more about detecting and warning about the problem strings. +1 for splitting development that way. Fixing the use of escape_json() is objective, but the choices around the warning are more subtle. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
On 1/26/15 4:51 PM, Tom Lane wrote: Jim Nasby jim.na...@bluetreble.com writes: On 1/24/15 3:31 PM, Tom Lane wrote: Another idea is to teach Valgrind that whenever a backend reduces its pin count on a shared buffer to zero, that buffer should become undefined memory. paranoia Shouldn't this technically tie in with ResourceOwners? No. ResourceOwner is just a mechanism to ensure that we remember to call UnpinBuffer, it has no impact on what the semantics of the pin count are. The *instant* the pin count goes to zero, another backend is entitled to recycle that buffer for some other purpose. But one backend can effectively pin a buffer more than once, no? If so, then ISTM there's some risk that code path A pins and forgets to unpin, but path B accidentally unpins for A. But as you say, this is all academic until the pin count hits 0, so it's probably not worth worrying about. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ExplainModifyTarget doesn't work as expected
On 12/22/14 12:50 AM, Etsuro Fujita wrote: I think ExplainModifyTarget should show the parent of the inheritance tree in multi-target-table cases, as described there, but noticed that it doesn't always work like that. Here is an example. Anything ever happen with this? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On 2015-01-26 18:30:13 -0600, Jim Nasby wrote: On 12/23/14 11:41 AM, Andres Freund wrote: I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. I'm proposing to do model it after errhidestatement(). I.e. add errhidecontext(). I've attached what I was tinkering with when I wrote this message. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Right, that was what triggered to look me into it. I'd cases where the same context was printed thousands of times. In case anyone picks this up... this problem exists in other places too, such as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context and log_min_context that operate akin to *_min_messages but control context output. I've pushed it already IIRC. I don't think we can just apply it without regard for possible users to that many cases - I think the RAISE DEBUG case is better addressed by a plpgsql option to *optionall* suppress context, than do it unconditionally. That's discussed somewhere nearby. -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)
On 12/26/14 1:38 AM, Abhijit Menon-Sen wrote: At 2014-09-25 15:40:11 +0530,a...@2ndquadrant.com wrote: All right, then I'll post a version that addresses Amit's other points, adds a new file/function to pgstattuple, acquires content locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all. Sorry, I forgot to post this patch. It does what I listed above, and seems to work fine (it's still quite a lot faster than pgstattuple in many cases). Anything happen with this? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-26 19:58:25 -0500, Bruce Momjian wrote: On Tue, Jan 27, 2015 at 01:43:41AM +0100, Andres Freund wrote: master + 32align.patch: -c max_connections=400 tps = 183791.872359 (high run vs. run variability) -c max_connections=401 tps = 185494.98244 (high run vs. run variability) master + 64align.patch: -c max_connections=400 tps = 489257.195570 -c max_connections=401 tps = 490496.520632 Pretty much as expected, rigth? Yes, I am convinced. Let's work on a patch now. Since I can reproduce some minor (1-3%) performance *regressions* at low client counts when aligning every shmem allocation, I'm inclined to just add special case code to BufferShmemSize()/InitBufferPool() to align descriptors to PG_CACHE_LINE_SIZE. That's really unlikely to regress anythign as it basically can't be a bad idea to align buffer descriptors. Contrary opinions? Robert? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Hi PG devs! Tom Lane t...@sss.pgh.pa.us writes: Wait for first IO, issue second IO request Compute Already have second IO request, issue third ... We'd be a lot less sensitive to IO latency. It would take about five minutes of coding to prove or disprove this: stick a PrefetchBuffer call into heapgetpage() to launch a request for the next page as soon as we've read the current one, and then see if that makes any obvious performance difference. I'm not convinced that it will, but if it did then we could think about how to make it work for real. Sorry for dropping in so late... I have done all this two years ago. For TPC-H Q8, Q9, Q17, Q20, and Q21 I see a speedup of ~100% when using IndexScan prefetching + Nested-Loops Look-Ahead (the outer loop!). (On SSD with 32 Pages Prefetch/Look-Ahead + Cold Page Cache / Small RAM) Regards, Daniel -- MSc. Daniel Bausch Research Assistant (Computer Science) Technische Universität Darmstadt http://www.dvs.tu-darmstadt.de/staff/dbausch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] SSL renegotiation and other related woes
Hi, When working on getting rid of ImmediateInterruptOK I wanted to verify that ssl still works correctly. Turned out it didn't. But neither did it in master. Turns out there's two major things we do wrong: 1) We ignore the rule that once called and returning SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called again with the same parameters. Unfortunately that rule doesn't mean just that the same parameters have to be passed in, but also that we can't just constantly switch between _read()/write(). Especially nonblocking backend code (i.e. walsender) and the whole frontend code violate this rule. 2) We start renegotiations in be_tls_write() while in nonblocking mode, but don't properly retry to handle socket readyness. There's a loop that retries handshakes twenty times (???), but what actually is needed is to call SSL_get_error() and ensure that there's actually data available. 2) is easy enough to fix, but 1) is pretty hard. Before anybody says that 1) isn't an important rule: It reliably causes connection aborts within a couple renegotiations. The higher the latency the higher the likelihood of aborts. Even locally it doesn't take very long to abort. Errors usually are something like SSL connection has been closed unexpectedly or SSL Error: sslv3 alert unexpected message and a host of other similar messages. There's a couple reports of those in the archives and I've seen many more in client logs. As far as I can see the only realistic way to fix 1) is to change both frontend and backend code to: a) Always check for socket read/writeability before calling SSL_read/write() when in nonblocking mode. That's a bit annoying because it nearly doubles the amount of syscalls we do or client communication, but I can't really se an alternative. That allows us to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a larger state machine that keeps track what we tried last. b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though we tested for read/writeability, we're very likely doing renegotiation. In that case we'll just have to block. There's already code that busy loops (and thus waits) in the frontend (c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't just return immediately to the upper layers as we'd otherwise likely violate the rule about calling ssl with the same parameters again. c) Add a somewhat hacky optimization whereas we allow to break out of a WANT_READ condition in a nonblocking socket when ssl-state == SSL_ST_OK. That's the cases where it actually, at least by my reading of the unreadable ssl code, safe to not wait. That case is somewhat important because we otherwise can end up waiting on both sides due to b), even when nonblocking calls where actually made. That condition essentially means that we'll only block if renegotiation or partial reads are in progress. Afaics at least. d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't actually need it anymore. These errors are much less frequent when using a plain frontend (e.g. psql/pgbench) because they don't use copy both stuff - the way these clients use the FE/BE protocol there's essentially natural synchronization points where nothing but renegotiation happens. With walsender (or pipelined queries!) both sides can write at the same time. My testcase for this is just to setup a server with a low ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and receive data via pg_receivexlog -n. Usually it'll error out quickly. I've done a preliminary implementation of the above steps and it survives transferring 25GB of WAL via the replication protocol with a ssl_renegotiation_limit=100kB - previously it failed much earlier. Does anybody have a neater way to tackle this? I'm not happy about this solution, but I really can't think of anything better (save ditching openssl maybe). I'm willing to clean up my hacked up fix for this, but not if we can't find agreement on the approach. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSL renegotiation and other related woes
On 2015-01-26 11:14:05 +0100, Andres Freund wrote: My testcase for this is just to setup a server with a low ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and receive data via pg_receivexlog -n. Usually it'll error out quickly. ... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services DROP TABLE IF EXISTS foo_:client_id; CREATE TABLE foo_:client_id AS SELECT * FROM generate_series(1, 100) a(a), generate_series(1, 1) b(b); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
On 2015-01-22 22:58:17 +0100, Andres Freund wrote: Because the way it currently works is a major crock. It's more luck than anything that it actually somewhat works. We normally rely on WAL to bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE we've ignored that. Hah: There's even a comment about some of the existing dangers: * * In PITR replay, the first of these isn't an issue, and the second * is only a risk if the CREATE DATABASE and subsequent template * database change both occur while a base backup is being taken. * There doesn't seem to be much we can do about that except document * it as a limitation. * * Perhaps if we ever implement CREATE DATABASE in a less cheesy way, * we can avoid this. only that it has never been documented as a limitation to my knowledge... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_dump with both --serializable-deferrable and -j
Hackers, when pg_dump is run with both --serializable-deferrable and -j options to pg_dump, it returns errors: pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE pg_dump: [parallel archiver] query was: SET TRANSACTION SNAPSHOT '0001E300-1' pg_dump: [archiver (db)] query failed: ERROR: a snapshot-importing transaction must not be READ ONLY DEFERRABLE I've checked it on 9.4.0 and 9.3.5. So, these options are incompatible now. Could we start snapshot-importing transaction with repeatable read isolation level? AFAICS, they should read exactly same data as snapshot-exporting serializable transaction. If not, could pg_dump return some more friendly error before actually trying to dump? -- With best regards, Alexander Korotkov.
Re: [HACKERS] Unsafe coding in ReorderBufferCommit()
Hi Tom, On 2015-01-23 16:47:30 -0500, Tom Lane wrote: There are at least two bugs in reorderbuffer.c's ReorderBufferCommit(): Thanks for fixing these! Unfortunately there's more - we'll currently do bad things if transaction commit fails. At the very least the (sub-)transaction begin commands need to be moved out of the exception block as they can fail... :(. E.g. because this is the 2^32-1 subxact or similar... I actually also want to strip the CATCH block of most of it's contents - there's really no need anymore for most of what it does. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
Hi, On 2015-01-25 14:02:47 -0500, Tom Lane wrote: I've been looking for other instances of the problem Mark Wilding pointed out, about missing volatile markers on variables that are modified in PG_TRY blocks and then used in the PG_CATCH stanzas. There definitely are some. Current gcc versions do not warn about that. If you turn on -Wclobbered, you don't always get warnings about the variables that are problematic, and you do get warnings about variables that are perfectly safe. (This is evidently why that option is not on by default: it's *useless*, or even counterproductive if it gives you false confidence that the compiler is protecting you.) I've observed that too. IIUC the warnings are generated by observing what register spilling does - which unfortunately seems to mean that the more complex a function gets the less useful the clobber warnings get because there's more spilling going on, generating pointless warnings. I think it's actually not a recent regression - in the past a lot of spurious instances of these warnings have been fixed by simply tacking on volatile on variables that didn't actually need it. I tested this on gcc 4.4.7 (current on RHEL6), and gcc 4.8.3 (current on Fedora 20); they behave the same. Even if this were fixed in bleeding-edge gcc, we'd definitely still need the volatile marker to get non-broken code compiled on most current platforms. It's not fixed (both in the correct warning and not needing anymore sense) at least for gcc (Debian 20150119-1) 5.0.0 20150119 (experimental) [trunk revision 219863] This is scary as hell. I intend to go around and manually audit every single PG_TRY in the current source code, but that is obviously not a long-term solution. Anybody have an idea about how we might get trustworthy mechanical detection of this type of situation? Not really, except convincing gcc to fix the inaccurate detection. Given that there've been bugs open about this (IIRC one from you even) for years I'm not holding my breath. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc
On 2015-01-25 15:40:10 -0500, Tom Lane wrote: Greg Stark st...@mit.edu writes: Perhaps Clang has a more useful warning? Clang, at least the version on my Mac, doesn't warn either with the settings we normally use, and it doesn't have -Wclobber at all. I tried turning on -Weverything, and it still didn't complain. (It did generate incorrect code though, so it's no better than gcc in that respect.) Even a pretty recent (3.6-rc1) clang doesn't warn. It generates lots of useful warnings, but nothing about longjmp clobbers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
On 1/22/15 6:03 PM, Pavel Stehule wrote: 2015-01-22 12:37 GMT+01:00 Marko Tiikkaja ma...@joh.to: Or is that a stupid idea? I just think hacking libpq for something like this is a huge overkill. I don't think so only plpgsql solution is satisfactory idea. There are some mix plpgsql / plperl ... application - and it isn't possible to remove error context from only one language. Yeah, not in libpq it isn't. Thing is, PL/PgSQL already is the exception here, since it's the only language which does this error message suppression. So if people did think this suppression was a good idea, only the people using PL/PgSQL were vocal enough to get the behavior changed. I'm not looking to change that. I can see where it's a lot nicer not to have the context visible for people who only care about the contents of the message, but the way it's done in PL/PgSQL right now is just not good enough. On the other hand, the backwards compatibility breakage of doing this in libpq is quite extensive. The most simple option seems to be to just allow a GUC to change PL/PgSQL's behavior to match what all other PLs are doing. .marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/pgSQL, RAISE and error context
2015-01-26 13:02 GMT+01:00 Marko Tiikkaja ma...@joh.to: On 1/22/15 6:03 PM, Pavel Stehule wrote: 2015-01-22 12:37 GMT+01:00 Marko Tiikkaja ma...@joh.to: Or is that a stupid idea? I just think hacking libpq for something like this is a huge overkill. I don't think so only plpgsql solution is satisfactory idea. There are some mix plpgsql / plperl ... application - and it isn't possible to remove error context from only one language. Yeah, not in libpq it isn't. Thing is, PL/PgSQL already is the exception here, since it's the only language which does this error message suppression. So if people did think this suppression was a good idea, only the people using PL/PgSQL were vocal enough to get the behavior changed. I'm not looking to change that. I can see where it's a lot nicer not to have the context visible for people who only care about the contents of the message, but the way it's done in PL/PgSQL right now is just not good enough. On the other hand, the backwards compatibility breakage of doing this in libpq is quite extensive. The most simple option seems to be to just allow a GUC to change PL/PgSQL's behavior to match what all other PLs are doing. libpq was changed more time - there is still a open task about a protocol change. I afraid about some unexpected side effects of your proposal if somebody mix languages - these side effects should not be critical - but on second hand current behave is not critical too - we can wait. .marko
Re: [HACKERS] proposal: searching in array function - array_position
2015-01-26 23:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com: On 1/24/15 2:48 AM, Pavel Stehule wrote: with array_offsets - returns a array of offsets + entryreturns a offset of first occurrence of some element in a array. It uses should be + entryreturns the offset of the first occurrence of some element in an array. It uses + entryreturns a array of offset of all occurrences some element in a array. It uses should be + entryreturns an array of the offsets of all occurrences of some element in an array. It uses Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code? I though about it - but there is different checks, different result processing, different result type. I didn't find any readable and reduced form :( You should remove the array_length() from the last array_offsets test; I don't see that it buys anything. ok I think there should be tests for what happens when you feed these functions a multi-dimensional array. I can do it. Result should be expected - it searching row by row due MD format Other than that, looks good. Thank you Pavel -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
On 1/24/15 3:31 PM, Tom Lane wrote: Another idea is to teach Valgrind that whenever a backend reduces its pin count on a shared buffer to zero, that buffer should become undefined memory. paranoia Shouldn't this technically tie in with ResourceOwners? If a pointer takes the pin count from 1 to 2, then that pointer should be invalid by the time the pin count goes from 2 to 1... I'm worried that a simple test when pin count is 0 could miss some cases of pointers just happening to be cleared by a second part of the code even though the pin count has already dropped. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Seq Scan
Jim Nasby jim.na...@bluetreble.com writes: On 1/23/15 10:16 PM, Amit Kapila wrote: Further, if we want to just get the benefit of parallel I/O, then I think we can get that by parallelising partition scan where different table partitions reside on different disk partitions, however that is a matter of separate patch. I don't think we even have to go that far. My experience with Postgres is that it is *very* sensitive to IO latency (not bandwidth). I believe this is the case because complex queries tend to interleave CPU intensive code in-between IO requests. So we see this pattern: Wait 5ms on IO Compute for a few ms Wait 5ms on IO Compute for a few ms ... We blindly assume that the kernel will magically do read-ahead for us, but I've never seen that work so great. It certainly falls apart on something like an index scan. If we could instead do this: Wait for first IO, issue second IO request Compute Already have second IO request, issue third ... We'd be a lot less sensitive to IO latency. It would take about five minutes of coding to prove or disprove this: stick a PrefetchBuffer call into heapgetpage() to launch a request for the next page as soon as we've read the current one, and then see if that makes any obvious performance difference. I'm not convinced that it will, but if it did then we could think about how to make it work for real. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
On Mon, Jan 26, 2015 at 5:21 PM, Stephen Frost sfr...@snowman.net wrote: I don't disagree with you about any of that. I don't think you disagree with my comment either. What I'm not entirely clear on is how consensus could be reached. Leaving it dormant for the better part of a year certainly doesn't appear to have helped that situation. We've discussed having it be part of the main server and having it be a contrib module and until about a week ago, I had understood that having it in contrib would be preferrable. Based on the recent emails, it appears there's been a shift of preference to having it be in-core, but clearly there's no time left to do that in this release cycle. Well, I'm not sure that anyone else here agreed with me on that, and one person does not a consensus make no matter who it is. The basic problem here is that we don't seem to have even two people here who agree on how this ought to be done. The basic dynamic here seems to be you asking for changes and Abhijit making them but without any real confidence, and I don't feel good about that. I'm willing to defer to an emerging consensus here when there is one, but what Abhijit likes best is not a consensus, and neither is what you like, and neither is what I like. What we need is some people agreeing with each other. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New CF app deployment
On Mon, Jan 26, 2015 at 5:36 PM, Josh Berkus j...@agliodbs.com wrote: In order to get a consensus on moving to a new app I had to explain what was wrong with the old app. Eventually I had to use strong language to do so, because nobody was paying attention otherwise. While Magnus's app isn't my original proposal, I'm 100% behind it because it gets us moving forward and eliminates a lot of obstacles both to submitters and CFMs. If Magnus hasn't already, in the future we'll be able to add more features to make things faster for major reviewers as well. Frankly, I don't believe that you had much to do with getting consensus on moving to a new app. I believe Magnus did that, mostly by writing it. And I'm not complaining about whatever strong language you may have used in the past; I'm complaining about you showing up on this thread now, after the switch has already been made, to lob insults. And I think that's great, because hopefully it will eventually make this much nicer than what I had. It isn't nicer yet, though, and you showing up and tossing out insults based on revisionist history won't fix that. You didn't previously say isn't nicer. You said essentially unusuable. There's a big difference between those two phrases. Your emails came off as the new app is a disaster, we should revert immediately, and I'm not the only one who read them that way. I stand by what I said. I find it very hard to use in the present state for the reasons that I set out. That was not intended as a request for a revert. I have subsequently written several emails clarifying my position on that point. If Magnus *wants* to revert it, that's fine. But I suspect he doesn't, and that's fine too. However, I'd very much like the features that are missing added back, and, yeah, I'm annoyed that they are missing. If you're going to throw around negative hyperbole, expect to get it thrown back at you. Oh, give me a break. What particularly peeves me about your remarks is that you've basically made no contribution to the CommitFest process in years. Aside from being a CFM for one CF each year, of course. OK, perhaps I'm overstating it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/26/15 5:11 PM, Jim Nasby wrote: The race condition is a problem for pg_start/stop_backup and friends. In this instance, everything will be shut down when the rsync is running, so there isn't a timestamp race condition to worry about. Yeah, I'm more concerned about people that use rsync to take base backups. Do we need to explicitly advise against that? Is there a way to work around this with a sleep after pg_start_backup to make sure all timestamps must be different? (Admittedly I haven't fully wrapped my head around this yet.) A sleep in pg_start_backup() won't work. The race condition is in rsync if the file is modified in the same second after it is copied. Waiting until the beginning of the next second in pg_start_backup() would actually make a bigger window where the issue can occur. I solved this problem in PgBackRest (an alternative to barman, etc.) by waiting the remainder of the second after the manifest is built before copying. That way, if a file is modified in the second after the manifest is built that later version will still be copied. Any mods after that will be copied in the next backup (as they should be). PgBackRest does not use rsync, tar, etc.) so I was able to code around the issue. The interesting thing about this race condition is that it does not affect the backup where it occurs. It affects the next backup when the modified file does not get copied because the timestamp is the same as the previous backup. Of course using checksums will solve the problem in rsync but that's expensive. Thus my comment earlier that the hot rsync / cold rsync method is not absolutely safe. If you do checksums on the cold rsync then you might as well just use them the first time - you'll have the same downtime either way. I've written tests to show the rsync vulnerability and another to show that this can affect a running database. However, to reproduce it reliably you need to force a checkpoint or have them happening pretty close together. -- - David Steele da...@pgmasters.ne signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Re: Abbreviated keys for Numeric
On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. Aren't there issues with the alignment of double precision floating point numbers on x86, too? Maybe my information there is at least partially obsolete. But it seems we'd have to control for this to be sure. I am not seriously suggesting pursuing abbreviation for float8 in the near term - numeric is clearly what we should concentrate on. It's interesting that abbreviation of float8 could potentially make sense, though. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Fri, Jan 23, 2015 at 1:16 PM, Stephen Frost sfr...@snowman.net wrote: Well, I'm still of the view that there's little to lose by having this remain out-of-core for a release or three. We don't really all agree on what we want, and non-core code can evolve a lot faster than core code, so what's the rush? Being out of core makes it unusable in production environments for a large number of organizations. :/ Tough. Once we accept something into core, we're stuck maintaining it forever. We shouldn't do that unless we're absolutely confident the design is solid. We are not close to having consensus on this. If somebody has a reason for accepting only core PostgreSQL and not add-ons, it's either a stupid reason, or it's because they believe that the core stuff will be better thought-out than the add-ons. If it's the latter, we shouldn't disappoint. I don't disagree with you about any of that. I don't think you disagree with my comment either. What I'm not entirely clear on is how consensus could be reached. Leaving it dormant for the better part of a year certainly doesn't appear to have helped that situation. We've discussed having it be part of the main server and having it be a contrib module and until about a week ago, I had understood that having it in contrib would be preferrable. Based on the recent emails, it appears there's been a shift of preference to having it be in-core, but clearly there's no time left to do that in this release cycle. While I appreciate that it doesn't quite feel right to use the GRANT system in the way I'm suggesting, I'm of the opinion that it's mostly due to a lack of implementation with documentation and examples about how it would work. Using the GRANT system gives us nearly the best of both worlds- an SQL-based syntax, a very fine-grained configuration method, dependency handling, rename handling, and means you don't need to be a superuser or have to restart the server to change the config, nor is there any need to deal with CSV configuration via GUCs. For my part, I'd still prefer an in-core system with top-level syntax (eg: AUDIT), but that could clearly come later even if we included pgaudit today (as Tom and others pointed out to me early this past summer). Auditing should definitely be a top-level capability in PG and shouldn't be relegated to external modules or commercial solutions. I've come around to feel that perhaps the first step should be a contrib module rather than going in-core from the beginning as we'd get the feedback which could lead to a better in-core solution. I don't think we're going to get it perfectly right the first time, either way, and having it be completely outside the tree will mean we won't get any real feedback on it. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: When it comes to changing auditing settings, I think that needs to be very restrictive. Really, it should be more (or differently) restrictive than SU, so that you can effectively audit your superusers with minimal worries about superusers tampering with auditing. I continue to be of the opinion that you're not going to be able to effectively audit your superusers with any mechanism that resides inside of the process space which superusers control. If you want to audit superusers, you need something that operates outside of the postgres process space. I'm certainly interested in that, but it's an orthogonal discussion to anything we're talking about here. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] proposal: searching in array function - array_position
On 1/26/15 4:17 PM, Pavel Stehule wrote: Any way to reduce the code duplication between the array and non-array versions? Maybe factor out the operator caching code? I though about it - but there is different checks, different result processing, different result type. I didn't find any readable and reduced form :( Yeah, that's why I was thinking specifically of the operator caching code... isn't that identical? That would at least remove a dozen lines... -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
Jim, * Jim Nasby (jim.na...@bluetreble.com) wrote: On 1/23/15 12:40 PM, Stephen Frost wrote: That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww. The race condition is a problem for pg_start/stop_backup and friends. In this instance, everything will be shut down when the rsync is running, so there isn't a timestamp race condition to worry about. Yeah, I'm more concerned about people that use rsync to take base backups. Do we need to explicitly advise against that? Is there a way to work around this with a sleep after pg_start_backup to make sure all timestamps must be different? (Admittedly I haven't fully wrapped my head around this yet.) I've thought about it a fair bit actually and I agree that there is some risk to using rsync for *incremental* base backups. That is, you have a setup where you loop with: pg_start_backup rsync - dest pg_stop_backup without using -I, changing what 'dest' is, or making sure it's empty every time. The problem is the 1s-level granularity used on the timestamp. A possible set of operations, all within 1s, is: file changed rsync starts copying the file file changed again (somewhere prior to where rsync is at) rsync finishes the file copy Now, this isn't actually a problem for the first time that file is backed up- the issue is if that file isn't changed again. rsync won't re-copy it, but that change that rsync missed won't be in the WAL history for the *second* backup that's done (only the first), leading to a case where that file would end up corrupted. This is a pretty darn narrow situation and one that I doubt many people will hit, but I do think it's possible. A way to address this would be to grab all timestamps for all files at the start of the backup and re-copy any files whose times are changed after that point (or which were being changed at the time the check was done, or perhaps simply any file which has a timestamp after the starting timestamp of the backup). How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell script instead of directly modifying things itself? Or perhaps some custom command file that could then be replayed by pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters for that to work... Yeah, I had suggested that to Bruce also, but it's not clear why that would be any different from an rsync --size-only in the end, presuming everything went according to plan. Yeah, if everything is shut down maybe we're OK. Regarding this, yes, I think it 'should' work, but it would definitely be good to test it quite a bit before relying on it.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] New CF app deployment
Robert, Didn't stop me. And actually, I didn't face a shitstorm of criticism. The way I remember it, I got a pretty much unqualified positive reaction at the time. Including from me, because it was a huge improvement on what we had before. As the new app is. Only later, when you had conveniently forgotten how bad things were before we had that tool, did you start routinely dumping on it. And, AFAICS, not because of anything it did wrong, only because of things that it didn't do that were features you wanted to have. In order to get a consensus on moving to a new app I had to explain what was wrong with the old app. Eventually I had to use strong language to do so, because nobody was paying attention otherwise. While Magnus's app isn't my original proposal, I'm 100% behind it because it gets us moving forward and eliminates a lot of obstacles both to submitters and CFMs. If Magnus hasn't already, in the future we'll be able to add more features to make things faster for major reviewers as well. And I think that's great, because hopefully it will eventually make this much nicer than what I had. It isn't nicer yet, though, and you showing up and tossing out insults based on revisionist history won't fix that. You didn't previously say isn't nicer. You said essentially unusuable. There's a big difference between those two phrases. Your emails came off as the new app is a disaster, we should revert immediately, and I'm not the only one who read them that way. If you're going to throw around negative hyperbole, expect to get it thrown back at you. What particularly peeves me about your remarks is that you've basically made no contribution to the CommitFest process in years. Aside from being a CFM for one CF each year, of course. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
Jim Nasby jim.na...@bluetreble.com writes: On 1/24/15 3:31 PM, Tom Lane wrote: Another idea is to teach Valgrind that whenever a backend reduces its pin count on a shared buffer to zero, that buffer should become undefined memory. paranoia Shouldn't this technically tie in with ResourceOwners? No. ResourceOwner is just a mechanism to ensure that we remember to call UnpinBuffer, it has no impact on what the semantics of the pin count are. The *instant* the pin count goes to zero, another backend is entitled to recycle that buffer for some other purpose. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
Peter == Peter Geoghegan p...@heroku.com writes: Peter What I find particularly interesting about this patch is that it Peter makes sorting numerics significantly faster than even sorting Peter float8 values, I get a much smaller difference there than you do. Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote: On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. Aren't there issues with the alignment of double precision floating point numbers on x86, too? Maybe my information there is at least partially obsolete. But it seems we'd have to control for this to be sure. I think getting rid of the function call for DatumGetFloat8() would be quite the win. On x86-64 the conversion then should amount to mov %rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both instructions have a cycle count of 1 + L1 access latency (4) + 2 because they use the same exection port. So it's about 12 fully pipelineable cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be noticeable if the conversion were inlined. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and rsync
On 1/26/15 5:08 PM, David Steele wrote: I've written tests to show the rsync vulnerability and another to show that this can affect a running database. However, to reproduce it reliably you need to force a checkpoint or have them happening pretty close together. Related to this and Stephen's comment about testing... ISTM it would be very useful to have a published suite of tests for PITR backups, perhaps even utilizing special techniques in Postgres to expose potential failure conditions. Similarly, it'd also be nice to have a suite of tests you could run to validate a backup that you've restored. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers
On Tue, Jan 27, 2015 at 12:03 AM, Jim Nasby jim.na...@bluetreble.com wrote: But one backend can effectively pin a buffer more than once, no? If so, then ISTM there's some risk that code path A pins and forgets to unpin, but path B accidentally unpins for A. The danger is that there's a codepath that refers to memory it doesn't have a pin on but that there is no actual test in our regression suite that doesn't actually have a second pin on the same buffer. If there is in fact no reachable code path at all without the second pin then there's no active bug, only a bad coding practice. But if there are code paths that we just aren't testing then that's a real bug. IIRC CLOBBER_FREED_MEMORY only affects palloc'd blocks. Do we not have a mode that automatically removes any buffer as soon as it's not pinned? That seems like it would be a valuable addition. Fwiw I think our experience is that bugs where buffers are unpinned get exposed pretty quickly in production. I suppose the same might not be true for rarely called codepaths or in cases where the buffers are usually already pinned. -- greg
Re: [HACKERS] Re: Abbreviated keys for Numeric
On 27/01/15 00:51, Andres Freund wrote: On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote: On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: Obvious overheads in float8 comparison include having to check for NaN, and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces a store/load to memory rather than just using a register. Looking at those might be more beneficial than messing with abbreviations. Aren't there issues with the alignment of double precision floating point numbers on x86, too? Maybe my information there is at least partially obsolete. But it seems we'd have to control for this to be sure. I think getting rid of the function call for DatumGetFloat8() would be quite the win. On x86-64 the conversion then should amount to mov %rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both instructions have a cycle count of 1 + L1 access latency (4) + 2 because they use the same exection port. So it's about 12 fully pipelineable cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be noticeable if the conversion were inlined. IIRC the DatumGetFloat8 was quite visible in the perf when I was writing the array version of width_bucket. It was one of the motivations for making special float8 version since not having to call it had significant effect. Sadly I don't remember if it was the function call itself or the conversion anymore. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)
On 12/23/14 11:41 AM, Andres Freund wrote: I think it'd generally be useful to have something like errhidecontext() akin to errhidestatement() to avoid things like the above. Under this proposal, do you want to suppress the context/statement unconditionally or via some hook/variable, because it might be useful to print the contexts/statements in certain cases where there is complex application and we don't know which part of application code causes problem. I'm proposing to do model it after errhidestatement(). I.e. add errhidecontext(). I've attached what I was tinkering with when I wrote this message. The usecases wher eI see this as being useful is high volume debug logging, not normal messages... I think usecase is valid, it is really difficult to dig such a log especially when statement size is big. Right, that was what triggered to look me into it. I'd cases where the same context was printed thousands of times. In case anyone picks this up... this problem exists in other places too, such as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context and log_min_context that operate akin to *_min_messages but control context output. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD
On 2015-01-05 21:43:04 -0500, Bruce Momjian wrote: On Fri, Jan 2, 2015 at 06:25:52PM +0100, Andres Freund wrote: I can't run tests right now... What exactly do you want to see with these tests? that's essentially what I've already benchmarked + some fprintfs? I want to test two patches that both allocate an extra 64 bytes but shift the alignment of just the buffer descriptor memory allocation. pgbench scale 300, with postgres configured: -c shared_buffers=48GB -c log_line_prefix=[%m %p] -c log_min_messages=debug1 -p 5440 -c checkpoint_segments=600 -c fsync=off -c synchronous_commit=off -c maintenance_work_mem=4GB -c huge_pages=off -c log_autovacuum_min_duration='10s' test: pgbench -n -M prepared -c 96 -j 96 -T 30 -S master as of 4b2a254793be50e31d43d4bfd813da8d141494b8: -c max_connections=400 tps = 193748.454044 (high run vs. run variability) -c max_connections=401 tps = 484238.551349 master + 32align.patch: -c max_connections=400 tps = 183791.872359 (high run vs. run variability) -c max_connections=401 tps = 185494.98244 (high run vs. run variability) master + 64align.patch: -c max_connections=400 tps = 489257.195570 -c max_connections=401 tps = 490496.520632 Pretty much as expected, rigth? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: two new role attributes and/or capabilities?
On 12/23/14 12:52 PM, Stephen Frost wrote: * José Luis Tallón (jltal...@adv-solutions.net) wrote: On 12/23/2014 05:29 PM, Stephen Frost wrote: The capabilities would be: * MAINTENANCE --- Ability to run VACUUM [ANALYZE | FREEZE] (but not VACUUM FULL), ANALYZE (including SET LOCAL statistics_target TO 1), There's likely to be discussion about these from the perspective that you really shouldn't need to run them all that much. Why isn't autovacuum able to handle this? For some (arguably, ill-devised) use cases of INSERT - SELECT aggregate - DELETE (third party, closed-source app, massive insert rate) at the very least, autovacuum can't possibly cope with the change rate in some tables, given that there are quite many other interactive queries running. Manually performing VACUUM / VACUUM ANALYZE on the (few) affected tables every 12h or so fixes the performance problem for the particular queries without impacting the other users too much --- the tables and indexes in question have been moved to a separate tablespace/disk volume of their own. Autovacuum can certainly run vacuum/analyze on a few tables every 12 hours, so I'm not really following where you see autovacuum being unable to cope. I agree that there*are* such cases, but getting more information about those cases and exactly what solution*does* work would really help us improve autovacuum to address those use-cases. (going through some old email...) The two cases I've dealt with recently are: - Tables with a fair update/delete rate that should always stay small The problem with these tables is if anything happens to upset vacuuming you can end up with a significantly larger than expected table that's now essentially impossible to shrink. This could be caused by a single long-running transaction that happens to be in play when autovac kicks off, or for other reasons. Even once you manage to get all the tuples off the end of the heap it can still be extremely difficult to grab the lock you need to truncate it. Running a vacuum every minute from cron seems to help control this. Sadly, your indexes still get bloated, so occasionally you want to re-cluster too. - Preemptively vacuuming during off-hours Many sites have either nightly or weekend periods of reduced load. Such sites can gain a great benefit from scheduling preemptive vacuums to reduce the odds of disruptive vacuuming activity during heavy activity periods. This is especially true when it comes to a scan_all vacuum of a large table; having autovac do one of those at a peak period can really hose things. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers