date:20150126

2015-01-26 13:39 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 On 1/26/15 1:14 PM, Pavel Stehule wrote:

 2015-01-26 13:02 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 I can see where it's a lot nicer not to have the context visible for
 people who only care about the contents of the message, but the way it's
 done in PL/PgSQL right now is just not good enough.  On the other hand,
 the
 backwards compatibility breakage of doing this in libpq is quite
 extensive.  The most simple option seems to be to just allow a GUC to
 change PL/PgSQL's behavior to match what all other PLs are doing.


 libpq was changed more time - there is still a open task about a protocol
 change.

 I afraid about some unexpected side effects of your proposal if somebody
 mix languages - these side effects should not be critical


 I have no idea what you're talking about.  What kind of side effects?


what will be a error context if plpgsql calls a plperl function that raises
a exception
what will be a error context if plperl calls a plpgsql functions that
raises a exception


  - but on second
 hand current behave is not critical too - we can wait.


 I think the current behavior is almost unacceptable.  It makes debugging
 in some cases really, really difficult.


if it is necessary, then we can modify libpq

Regards

Pavel





 .marko

[HACKERS] fix typo in guc.c

2015-01-26 Thread Sawada Masahiko

Hi,

Attached patch fixes the typo in guc.c.
It's typo, I think.

Regards,

---
Sawada Masahiko


fix_typo_guc_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-26 Thread Andrew Dunstan



On 01/23/2015 02:18 AM, Noah Misch wrote:

On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote:

The following case has just been brought to my attention (look at the
differing number of backslashes):

andrew=# select jsonb '\\u';
   jsonb
--
  \u
(1 row)

andrew=# select jsonb '\u';
   jsonb
--
  \u
(1 row)

A mess indeed.  The input is unambiguous, but the jsonb representation can't
distinguish \u from \\u.  Some operations on the original json
type have similar problems, since they use an in-memory binary representation
with the same shortcoming:

[local] test=# select json_array_element_text($$[\u]$$, 0) =
test-# json_array_element_text($$[\\u]$$, 0);
  ?column?
--
  t
(1 row)


Things get worse, though. On output, '\uabcd' for any four hex digits is
recognized as a unicode escape, and thus the backslash is not escaped, so
that we get:

andrew=# select jsonb '\\uabcd';
   jsonb
--
  \uabcd
(1 row)


We could probably fix this fairly easily for non- U+ cases by having
jsonb_to_cstring use a different escape_json routine.

Sounds reasonable.  For 9.4.1, before more people upgrade?


But it's a mess, sadly, and I'm not sure what a good fix for the U+ case
would look like.

Agreed.  When a string unescape algorithm removes some kinds of backslash
escapes and not others, it's nigh inevitable that two semantically-distinct
inputs can yield the same output.  json_lex_string() fell into that trap by
making an exception for \u.  To fix this, the result needs to be fully
unescaped (\u converted to the NUL byte) or retain all backslash escapes.
(Changing that either way is no fun now that an on-disk format is at stake.)


Maybe we should detect such input and emit a warning of
ambiguity? It's likely to be rare enough, but clearly not as rare as we'd
like, since this is a report from the field.

Perhaps.  Something like WARNING:  jsonb cannot represent \\u; reading as
\u?  Alas, but I do prefer that to silent data corruption.





Maybe something like this patch. I have two concerns about it, though. 
The first is the possible performance impact of looking for the 
offending string in every jsonb input, and the second is that the test 
isn't quite right, since input of \\\u doesn't raise this issue - 
i.e. the problem arises when u is preceded by an even number of 
backslashes.


For the moment, maybe I could commit the fix for the non U+ case for 
escape_json, and we could think some more about detecting and warning 
about the problem strings.


cheers

andrew
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 3c137ea..8d2b38f 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -94,6 +94,8 @@ static void datum_to_json(Datum val, bool is_null, StringInfo result,
 static void add_json(Datum val, bool is_null, StringInfo result,
 		 Oid val_type, bool key_scalar);
 static text *catenate_stringinfo_string(StringInfo buffer, const char *addon);
+static void  escape_json_work(StringInfo buf, const char *str,
+			  bool jsonb_unicode);
 
 /* the null action object used for pure validation */
 static JsonSemAction nullSemAction =
@@ -2356,6 +2358,18 @@ json_object_two_arg(PG_FUNCTION_ARGS)
 void
 escape_json(StringInfo buf, const char *str)
 {
+	escape_json_work( buf, str, false);
+}
+
+void
+escape_jsonb(StringInfo buf, const char *str)
+{
+	escape_json_work( buf, str, true);
+}
+
+static void
+escape_json_work(StringInfo buf, const char *str, bool jsonb_unicode)
+{
 	const char *p;
 
 	appendStringInfoCharMacro(buf, '\');
@@ -2398,7 +2412,9 @@ escape_json(StringInfo buf, const char *str)
  * unicode escape that should be present is \u, all the
  * other unicode escapes will have been resolved.
  */
-if (p[1] == 'u' 
+if (jsonb_unicode  strncmp(p+1, u, 5) == 0)
+	appendStringInfoCharMacro(buf, *p);
+else if (!jsonb_unicode  p[1] == 'u' 
 	isxdigit((unsigned char) p[2]) 
 	isxdigit((unsigned char) p[3]) 
 	isxdigit((unsigned char) p[4]) 
diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 644ea6d..ba1e7f0 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -218,6 +218,14 @@ jsonb_from_cstring(char *json, int len)
 	JsonbInState state;
 	JsonSemAction sem;
 
+	if (strstr(json,u) != NULL)
+		ereport(WARNING,
+(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
+ errmsg(jsonb cannot represent u; reading as
+\\u),
+ errdetail(Due to an implementation restriction, jsonb cannot represent a unicode null immediately preceded by a backslash.)));
+		elog(WARNING,);
+
 	memset(state, 0, sizeof(state));
 	memset(sem, 0, sizeof(sem));
 	lex = makeJsonLexContextCstringLen(json, len, true);
@@ -305,7 +313,7 @@ jsonb_put_escaped_value(StringInfo out, JsonbValue *scalarVal)

Re: [HACKERS] Unsafe coding in ReorderBufferCommit()

Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-23 16:47:30 -0500, Tom Lane wrote:
 There are at least two bugs in reorderbuffer.c's ReorderBufferCommit():

 Thanks for fixing these!

 Unfortunately there's more - we'll currently do bad things if
 transaction commit fails. At the very least the (sub-)transaction begin
 commands need to be moved out of the exception block as they can
 fail... :(. E.g. because this is the 2^32-1 subxact or similar...

 I actually also want to strip the CATCH block of most of it's contents -
 there's really no need anymore for most of what it does.

No objection here.  I was just doing a mechanical transform of the
function, not based on any deep understanding of what it does.

The less you need to do in a CATCH block, the better.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

Andres Freund and...@2ndquadrant.com writes:
 On 2015-01-25 14:02:47 -0500, Tom Lane wrote:
 I've been looking for other instances of the problem Mark Wilding
 pointed out, about missing volatile markers on variables that
 are modified in PG_TRY blocks and then used in the PG_CATCH stanzas.
 There definitely are some.  Current gcc versions do not warn about that.

 I think it's actually not a recent regression - in the past a lot of
 spurious instances of these warnings have been fixed by simply tacking
 on volatile on variables that didn't actually need it.

Yeah, it's not.  For years and years I just automatically stuck a volatile
on anything gcc 2.95.3 complained about, so that's why there's so many
volatiles there now.  But I've not done that lately, and comparing what
2.95.3 warns about now with what a modern version says with -Wclobbered,
it's clear that it's pretty much the same broken (and perhaps slightly
machine-dependent) algorithm :-(

 This is scary as hell.  I intend to go around and manually audit
 every single PG_TRY in the current source code, but that is obviously
 not a long-term solution.  Anybody have an idea about how we might
 get trustworthy mechanical detection of this type of situation?

 Not really, except convincing gcc to fix the inaccurate detection. Given
 that there've been bugs open about this (IIRC one from you even) for
 years I'm not holding my breath.

I've completed the audit, and there were a total of only five places
that need fixes (including the two I already patched over the weekend).
It's mostly pretty new code too, which probably explains why we don't
already have field reports of problems.

Interestingly, plpython seems heavily *over* volatilized.  Not sure
whether to take some out there for consistency, or just leave it alone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

On Sun, Jan 25, 2015 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 This is scary as hell.  I intend to go around and manually audit
 every single PG_TRY in the current source code, but that is obviously
 not a long-term solution.  Anybody have an idea about how we might
 get trustworthy mechanical detection of this type of situation?

One idea I've been thinking about for a while is to create some new,
safer notation.  Suppose we did this:

PG_TRY_WITH_CLEANUP(cleanup_function, cleanup_argument);
{
/* code requiring cleanup */
}
PG_END_TRY_WITH_CLEANUP();

Instead of doing anything with sigsetjmp(), this would just push a
frame onto a cleanup stack. We would call of those callbacks from
innermost to outermost before doing siglongjmp().  With this design,
we don't need any volatile-ization.

This doesn't work for PG_CATCH() blocks that do not PG_RE_THROW(), but
there are not a ton of those.  In a quick search, I found initTrie,
do_autovacuum, xml_is_document, and a number of instances in various
procedural languages.  Most instances in the core code could be
converted to the style above.  Aside from any reduction in the need
for volatile, this might actually perform slightly better, because
sigsetjmp() is a system call on some platforms.  There are probably
few cases where that actually matters, but the one in pq_getmessage(),
for example, might not be entirely discountable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

On 2015-01-26 10:52:07 -0500, Robert Haas wrote:
 On Sun, Jan 25, 2015 at 2:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  This is scary as hell.  I intend to go around and manually audit
  every single PG_TRY in the current source code, but that is obviously
  not a long-term solution.  Anybody have an idea about how we might
  get trustworthy mechanical detection of this type of situation?
 
 One idea I've been thinking about for a while is to create some new,
 safer notation.  Suppose we did this:
 
 PG_TRY_WITH_CLEANUP(cleanup_function, cleanup_argument);
 {
 /* code requiring cleanup */
 }
 PG_END_TRY_WITH_CLEANUP();

That's pretty similar to to PG_ENSURE_ERROR_CLEANUP, except that that is
also called after FATAL errors. If we do this, we probably should try to
come up with a easier to understand naming scheme. PG_TRY_WITH_CLEANUP
vs. PG_ENSURE_ERROR_CLEANUP isn't very clear to a reader.

 Instead of doing anything with sigsetjmp(), this would just push a
 frame onto a cleanup stack. We would call of those callbacks from
 innermost to outermost before doing siglongjmp().  With this design,
 we don't need any volatile-ization.

On the other hand most of the callsites will need some extra state
somewhere to keep track of what to undo. That's a bit of restructuring
work too.  And if the cleanup function needs to reference anything done
in the TRY block, that state will need to be volatile too.

 Aside from any reduction in the need
 for volatile, this might actually perform slightly better, because
 sigsetjmp() is a system call on some platforms.  There are probably
 few cases where that actually matters, but the one in pq_getmessage(),
 for example, might not be entirely discountable.

Hm. How would you implement PG_TRY_WITH_CLEANUP without a sigsetjmp()?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

Pavel Stehule pavel.steh...@gmail.com writes:
 2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to:
 I am thinking, so solution

  /* if we are doing RAISE, don't report its location */
 if (estate-err_text == raise_skip_msg)
 return;

 is too simple, and this part should be fixed. This change can be done by on
 plpgsql or libpq side. This is bug, and it should be fixed.

Doing this in libpq is utterly insane.  It has not got sufficient context
to do anything intelligent.  The fact that it's not intelligent is exposed
by the regression test changes that the proposed patch causes, most of
which do not look like improvements.

Another problem is that past requests to change this behavior have
generally been to the effect that people wanted *more* context suppressed
not less, ie they didn't want any CONTEXT lines at all on certain
messages.  So the proposed patch seems to me to be going in exactly the
wrong direction.

The design I thought had been agreed on was to add some new option to
plpgsql's RAISE command which would cause suppression of all CONTEXT lines
not just the most closely nested one.  You could argue about whether the
behavior needs to be 100% backwards compatible or not --- if so, perhaps
it could be a three-way option all, none, or one line, defaulting to the
last for backwards compatibility.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-26 16:14 GMT+01:00 Tom Lane t...@sss.pgh.pa.us:

 Pavel Stehule pavel.steh...@gmail.com writes:
  2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to:
  I am thinking, so solution

   /* if we are doing RAISE, don't report its location */
  if (estate-err_text == raise_skip_msg)
  return;

  is too simple, and this part should be fixed. This change can be done by
 on
  plpgsql or libpq side. This is bug, and it should be fixed.

 Doing this in libpq is utterly insane.  It has not got sufficient context
 to do anything intelligent.  The fact that it's not intelligent is exposed
 by the regression test changes that the proposed patch causes, most of
 which do not look like improvements.


I don't understand. There can be a overhead due useless transformation some
data to client side. But all what it need - errcontext and errlevel is
possible.


 Another problem is that past requests to change this behavior have
 generally been to the effect that people wanted *more* context suppressed
 not less, ie they didn't want any CONTEXT lines at all on certain
 messages.  So the proposed patch seems to me to be going in exactly the
 wrong direction.

 The design I thought had been agreed on was to add some new option to
 plpgsql's RAISE command which would cause suppression of all CONTEXT lines
 not just the most closely nested one.  You could argue about whether the
 behavior needs to be 100% backwards compatible or not --- if so, perhaps
 it could be a three-way option all, none, or one line, defaulting to the
 last for backwards compatibility.


 I see a problem what should be default behave. When I raise NOTICE, then I
don't need (don't would) to see CONTEXT lines, When I raise EXCEPTION, then
I usually would to see CONTEXT lines.

Cannot be solution?

estate-err_text = stmt-elog_level == ERROR ? estate-err_text :
raise_skip_msg  ;

Regards

Pavel





 regards, tom lane

Re: [HACKERS] Additional role attributes superuser review

* Andres Freund (and...@2ndquadrant.com) wrote:
 On 2015-01-26 13:47:02 -0500, Stephen Frost wrote:
  Right.  We already have a role attribute which allows pg_basebackup
  (replication).  Also, with pg_basebackup / rolreplication, your role
  is able to read the entire data directory from the server, that's not
  the case with only rights to run pg_start/stop_backup.
  
  In conjunction with enterprise backup solutions and SANs, which offer
  similar controls where a generally unprivileged user can have a snapshot
  of the system taken through the SAN interface, you can give users the
  ability to run ad-hoc backups of the cluster without giving them
  superuser-level access or replication-level access.
 
 I'm sorry if this has already been discussed, but the thread is awfully
 long already. But what's actually the point of having a separate
 EXCLUSIVEBACKUP permission? Using it still requires full file system
 access to the data directory, so the additional permissions granted by
 replication aren't really relevant.

I agree that it's a pretty long thread for what amount to a few
relatively straight-forward role attributes (at least, in my view).

 I don't think the comparison with the SAN snapshot functionality is apt:
 The SAN solution itself will still run with full data access. Just
 pressing the button for the snapshot requires less. You're comparing
 that button to pg_start/stop_backup() - but that doesn't make sense,
 because it's only useful if somebody actually takes the backup during
 that time.

I'm not following your logic here..  You're right- just pressing the
button to take a snapshot can be granted out to a lower-level user using
the SAN solution.  That snapshot's useless unless the user can first run
pg_start_backup though (and subsequently run pg_stop_backup afterwards).
Clearly, XLOG archiving has to be set up already, but that would be set
up when the system is initially brought online.

This capability would be used in conjunction with the SAN snapshot
capability, it's not intended to be a comparison to what SANs offer.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

On Mon, Jan 26, 2015 at 2:10 PM, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Now that these issues are fixed and the buildfarm is green again, I'm
 going to try re-enabling this optimization on Windows.  My working
 theory is that disabling that categorically was a mis-diagnosis of the
 real problem, and that now that the issues mentioned above are cleaned
 up, it'll just work.  That might not be right, but I think it's worth
 a try.

 Right. We're only supporting the C locale on Windows. How could
 Windows possibly be broken if locale-related functions like strxfrm()
 and strcoll() are not even called?

That's what I hope to find out.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix typo in guc.c

2015-01-26 Thread Sawada Masahiko

On Mon, Jan 26, 2015 at 10:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/26/2015 02:56 PM, Sawada Masahiko wrote:

 Hi,

 Attached patch fixes the typo in guc.c.
 It's typo, I think.

 diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
 index f6df077..f4f1965 100644
 --- a/src/backend/utils/misc/guc.c
 +++ b/src/backend/utils/misc/guc.c
 @@ -3880,7 +3880,7 @@ build_guc_variables(void)
 }

 /*
 -* Create table with 20% slack
 +* Create table with 25% slack
  */
 size_vars = num_vars + num_vars / 4;


 No, I think that's intentional. After the creation, indeed 20% of the table
 is empty. For example, if num_vars is 100, size_vars is 125. And 100/125 =
 0.80.

 In add_guc_variable, where the table is enlarged, it says:

 /*
  * Increase the vector by 25%
  */
 int size_vars = size_guc_variables +
 size_guc_variables / 4;


 That's correct too. The table is enlarged by 25%, so after the operation,
 20% of it is again free. Subtle ;-)

 (Although I don't think increase is the correct term here. Should be
 enlarge, or increase the *size* of the vector by 25%.)


Oh I see. Thank you for explain!
I think so too, 'enlarged' should be used in here.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-26 14:02 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 On 1/26/15 1:44 PM, Pavel Stehule wrote:

 2015-01-26 13:39 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 On 1/26/15 1:14 PM, Pavel Stehule wrote:

 I afraid about some unexpected side effects of your proposal if somebody
 mix languages - these side effects should not be critical


 I have no idea what you're talking about.  What kind of side effects?


 what will be a error context if plpgsql calls a plperl function that
 raises
 a exception
 what will be a error context if plperl calls a plpgsql functions that
 raises a exception


 I fail to see the point.  How would that be different from what happens
 today?  Remember, PL/PgSQL only suppresses the *topmost* stack frame, and
 only when using RAISE from within a PL/PgSQL function.


I had to though little bit more - and I am thinking so we should to return
back to start of this thread.

Current state:

1. RAISE in plpgsql doesn't show a context - what we want in RAISE NOTICE
and we don't want in RAISE EXCEPTION

I am thinking, so solution

 /* if we are doing RAISE, don't report its location */
if (estate-err_text == raise_skip_msg)
return;

is too simple, and this part should be fixed. This change can be done by on
plpgsql or libpq side. This is bug, and it should be fixed.

2. Personally I prefer a little bit conceptual solution, that needs a libpq
change because I wish some mode between terse and verbose mode - I would
not to see context for NOTICEs, but I would to see context for errors. This
request is generic - independent on used PL. @2 is my feature request and
it is possible independent on @1.

3. your proposal plpgsql.suppress_simple_error_context fix the @2 partially
- just I prefer a generic solution that will be available for all PL -
exception processing is same for all PL, so filtering should be common too.

Regards

Pavel




 .m

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

On Mon, Jan 26, 2015 at 1:34 PM, Andres Freund and...@2ndquadrant.com wrote:
 I guess we'd need to tie it into PG_exception_stack levels, so it
 correctly handles nesting with sigsetjmp locations. In contrast to
 sigsetjmp() style handlers we can't rely on PG_CATCH cleaning up that
 state.

I was thinking that PG_TRY() would need to squirrel away the value of
cleanup_stack and set it to null, and PG_CATCH would need to restore
the squirreled-away value.  That way we fire handlers in
reverse-order-of-registration regardless of which style of
registration is used.

 I wonder how hard it is to make that reliable for errors thrown in a
 cleanup callback. Those certainly are possible in some of the PG_CATCHes
 we have right now.

I think what should happen is that pg_re_throw() should remove each
frame from the stack and then call the associated callback.  If
another error occurs, we won't try that particular callback again, but
we'll try the next one.  In most cases this should be impossible
anyway because the catch-block is just a variable assignment or
something, but not in all cases, of course.

 And before doing sigsetjmp to the active handler, we run all the
 functions on the stack.  There shouldn't be any need for volatile; the
 compiler has to know that once we make it possible to get at a pointer
 to cb_arg via a global variable (cleanup_stack), any function we call
 in another translation unit could decide to call that function and it
 would need to see up-to-date values of everything cb_arg points to.
 So before calling such a function it had better store that data to
 memory, not just leave it lying around in a register somewhere.

 Given that we, at the moment at least, throw ERRORs from signal
 handlers, I don't think that necessarily holds true. I think we're not
 that far away from getting rid of all of those though.

Well, I think the theory behind that is we should only be throwing
errors from signal handlers when ImmediateInterruptOK = true, which is
only supposed to happen when the code thereby interrupted isn't doing
anything interesting.  So if you set up some state that can be
touched by the error-handling path and then, in the same function, set
ImmediateInterruptOK, yeah, that probably needs to be volatile.  But
if function A sets up the state and then calls function B in another
translation unit, and B sets ImmediateInterruptOK, then A has no way
of knowing that B wasn't going to just throw a garden-variety error,
so it better have left things in a tidy state.  We still need to
scrutinize the actual functions that set ImmediateInterruptOK and, if
they are static, their callers, but that isn't too many places to look
at.  It's certainly (IMHO) a lot better than trying to stick in
volatile qualifiers every place we use a try-catch block, and if you
succeed in getting rid of ImmediateInterruptOK, then it goes away
altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

* Robert Haas (robertmh...@gmail.com) wrote:
 On Mon, Jan 26, 2015 at 1:59 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-01-26 13:47:02 -0500, Stephen Frost wrote:
  Right.  We already have a role attribute which allows pg_basebackup
  (replication).  Also, with pg_basebackup / rolreplication, your role
  is able to read the entire data directory from the server, that's not
  the case with only rights to run pg_start/stop_backup.
 
  In conjunction with enterprise backup solutions and SANs, which offer
  similar controls where a generally unprivileged user can have a snapshot
  of the system taken through the SAN interface, you can give users the
  ability to run ad-hoc backups of the cluster without giving them
  superuser-level access or replication-level access.
 
  I'm sorry if this has already been discussed, but the thread is awfully
  long already. But what's actually the point of having a separate
  EXCLUSIVEBACKUP permission? Using it still requires full file system
  access to the data directory, so the additional permissions granted by
  replication aren't really relevant.
 
 That's not necessarily true.  You could be able to run a command like
 san_snapshot $PGDATA without necessarily having the permissions to
 inspect the contents of the resulting snapshot.  Of course somebody
 should be doing that, but in accord with the principle of least
 privilege, there's no reason that the account running the unattended
 backup needs to have those rights.

Right!  You explained it more clearly than I did.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 3:13 PM, Peter Geoghegan p...@heroku.com wrote:
 On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
 There's a fairly serious readability problem when someone has posted a
 patch as a subthread of some more general discussion. For example, look
 at the adaptive ndistinct estimator patch: it's not obvious which
 attachment is the actual patch, and whether the latest email has
 anything to do with the patch is entirely arbitrary.

 I think that the inability to put each message in context, with
 metadata comments associated with individual messages is a serious
 regression in functionality. I hope it is fixed soon. I raised this
 issue at the earliest opportunity, when Magnus privately sought
 feedback early last year.

I agree.  Starting a new email thread for each patch version is, IMHO,
a complete non-starter.  It's 100% contrary to what has generally been
advocated as best-practice up until now, and it is basically saying we
should alter our workflow because the tool can't cope with the one
we've got.  The whole point of having home-grown tools for this stuff
is that they're supposed to work with the way we already like to do
things instead of forcing us to work in new ways.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Magnus Hagander

On Mon, Jan 26, 2015 at 9:20 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jan 26, 2015 at 3:13 PM, Peter Geoghegan p...@heroku.com wrote:
  On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth
  and...@tao11.riddles.org.uk wrote:
  There's a fairly serious readability problem when someone has posted a
  patch as a subthread of some more general discussion. For example, look
  at the adaptive ndistinct estimator patch: it's not obvious which
  attachment is the actual patch, and whether the latest email has
  anything to do with the patch is entirely arbitrary.
 
  I think that the inability to put each message in context, with
  metadata comments associated with individual messages is a serious
  regression in functionality. I hope it is fixed soon. I raised this
  issue at the earliest opportunity, when Magnus privately sought
  feedback early last year.


Yes, and the agreement after that feedback was to try it out and then
figure out what changes were needed? As about half the feedback said it was
better without and half said it was better with.



I agree.  Starting a new email thread for each patch version is, IMHO,
 a complete non-starter.  It's 100% contrary to what has generally been
 advocated as best-practice up until now, and it is basically saying we
 should alter our workflow because the tool can't cope with the one
 we've got.  The whole point of having home-grown tools for this stuff
 is that they're supposed to work with the way we already like to do
 things instead of forcing us to work in new ways.


Why would you create a new thread for a new *version* of a patch? The whole
*point* of the system is that you shouldn't do that, yes, so I'm not sure
where you got the idea that you should do that from?

I though the issue currently discussed was when posted a *different* patch
on the same thread, or that this required the first patch in a thread that
used to be about something that was not a patch.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Additional role attributes superuser review

On 2015-01-26 13:47:02 -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell
  adam.brightw...@crunchydatasolutions.com wrote:
   After re-reading through this thread is seems like EXCLUSIVEBACKUP 
   (proposed
   by Magnus) seemed to be a potentially acceptable alternative.
  
  So this would let you do pg_start_backup() and pg_stop_backup(), but
  it wouldn't let you run pg_basebackup against the server?
 
 Right.  We already have a role attribute which allows pg_basebackup
 (replication).  Also, with pg_basebackup / rolreplication, your role
 is able to read the entire data directory from the server, that's not
 the case with only rights to run pg_start/stop_backup.
 
 In conjunction with enterprise backup solutions and SANs, which offer
 similar controls where a generally unprivileged user can have a snapshot
 of the system taken through the SAN interface, you can give users the
 ability to run ad-hoc backups of the cluster without giving them
 superuser-level access or replication-level access.

I'm sorry if this has already been discussed, but the thread is awfully
long already. But what's actually the point of having a separate
EXCLUSIVEBACKUP permission? Using it still requires full file system
access to the data directory, so the additional permissions granted by
replication aren't really relevant.

I don't think the comparison with the SAN snapshot functionality is apt:
The SAN solution itself will still run with full data access. Just
pressing the button for the snapshot requires less. You're comparing
that button to pg_start/stop_backup() - but that doesn't make sense,
because it's only useful if somebody actually takes the backup during
that time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

On 2015-01-26 14:05:03 -0500, Stephen Frost wrote:
 This capability would be used in conjunction with the SAN snapshot
 capability, it's not intended to be a comparison to what SANs offer.

Oh, on a reread that's now clear. Many of those actually allow hooks to
be run when taking a snapshot, that'd probably be a better approach. But
I can now see the point.

Thanks,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

On Mon, Jan 26, 2015 at 1:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-26 13:47:02 -0500, Stephen Frost wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
  On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell
  adam.brightw...@crunchydatasolutions.com wrote:
   After re-reading through this thread is seems like EXCLUSIVEBACKUP 
   (proposed
   by Magnus) seemed to be a potentially acceptable alternative.
 
  So this would let you do pg_start_backup() and pg_stop_backup(), but
  it wouldn't let you run pg_basebackup against the server?

 Right.  We already have a role attribute which allows pg_basebackup
 (replication).  Also, with pg_basebackup / rolreplication, your role
 is able to read the entire data directory from the server, that's not
 the case with only rights to run pg_start/stop_backup.

 In conjunction with enterprise backup solutions and SANs, which offer
 similar controls where a generally unprivileged user can have a snapshot
 of the system taken through the SAN interface, you can give users the
 ability to run ad-hoc backups of the cluster without giving them
 superuser-level access or replication-level access.

 I'm sorry if this has already been discussed, but the thread is awfully
 long already. But what's actually the point of having a separate
 EXCLUSIVEBACKUP permission? Using it still requires full file system
 access to the data directory, so the additional permissions granted by
 replication aren't really relevant.

That's not necessarily true.  You could be able to run a command like
san_snapshot $PGDATA without necessarily having the permissions to
inspect the contents of the resulting snapshot.  Of course somebody
should be doing that, but in accord with the principle of least
privilege, there's no reason that the account running the unattended
backup needs to have those rights.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

On Fri, Jan 23, 2015 at 12:34 PM, Robert Haas robertmh...@gmail.com wrote:
 In other words, even on systems that don't HAVE_LOCALE_T, we still
 have to support the default collation and the C collation, and they
 have to behave differently.  There's no way to make that work using
 only strxfrm(), because nothing gets passed to that function to tell
 it which of those two things it is supposed to do.

 Now even if the above were not an issue, for example because we
 dropped support for systems where !HAVE_LOCALE_T, I think it would be
 poor form to depend on strxfrm_l() to behave like memcpy() where we
 could just as easily call memcpy() and be *sure* that it was going to
 do what we wanted.  Much of writing good code is figuring out what
 could go wrong and then figuring out how to prevent it, and relying on
 strxfrm_l() would be an unnecessary risk regardless.  Given the
 existence of !HAVE_LOCALE_T systems, it's just plain broken.

Now that these issues are fixed and the buildfarm is green again, I'm
going to try re-enabling this optimization on Windows.  My working
theory is that disabling that categorically was a mis-diagnosis of the
real problem, and that now that the issues mentioned above are cleaned
up, it'll just work.  That might not be right, but I think it's worth
a try.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

On Mon, Jan 26, 2015 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Now that these issues are fixed and the buildfarm is green again, I'm
 going to try re-enabling this optimization on Windows.  My working
 theory is that disabling that categorically was a mis-diagnosis of the
 real problem, and that now that the issues mentioned above are cleaned
 up, it'll just work.  That might not be right, but I think it's worth
 a try.

Right. We're only supporting the C locale on Windows. How could
Windows possibly be broken if locale-related functions like strxfrm()
and strcoll() are not even called?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Sun, Jan 25, 2015 at 3:22 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 There's a fairly serious readability problem when someone has posted a
 patch as a subthread of some more general discussion. For example, look
 at the adaptive ndistinct estimator patch: it's not obvious which
 attachment is the actual patch, and whether the latest email has
 anything to do with the patch is entirely arbitrary.

I think that the inability to put each message in context, with
metadata comments associated with individual messages is a serious
regression in functionality. I hope it is fixed soon. I raised this
issue at the earliest opportunity, when Magnus privately sought
feedback early last year.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

  Aside from any reduction in the need
  for volatile, this might actually perform slightly better, because
  sigsetjmp() is a system call on some platforms.  There are probably
  few cases where that actually matters, but the one in pq_getmessage(),
  for example, might not be entirely discountable.
 
  Hm. How would you implement PG_TRY_WITH_CLEANUP without a sigsetjmp()?
 
 Posit:
 
 struct cleanup_entry {
 void (*callback)(void *);
 void *callback_arg;
 struct cleanup_entry *next;
 };
 cleanup_entry *cleanup_stack = NULL;
 
 So PG_TRY_WITH_CLEANUP(_cb, _cb_arg) does (approximately) this:
 
 {
 cleanup_entry e;
 cleanup_entry *orige;
 e.callback = (_cb);
 e.callback_arg = (_cb_arg);
 e.next = cleanup_stack;
 orige = cleanup_stack;
 cleanup_stack = e;
 
 And when you PG_END_TRY_WITH_CLEANUP, we just do this:
 
 cleanup_stack = orige;
 }

Hm. Not bad.

[ponder]

I guess we'd need to tie it into PG_exception_stack levels, so it
correctly handles nesting with sigsetjmp locations. In contrast to
sigsetjmp() style handlers we can't rely on PG_CATCH cleaning up that
state.

I wonder how hard it is to make that reliable for errors thrown in a
cleanup callback. Those certainly are possible in some of the PG_CATCHes
we have right now.

 And before doing sigsetjmp to the active handler, we run all the
 functions on the stack.  There shouldn't be any need for volatile; the
 compiler has to know that once we make it possible to get at a pointer
 to cb_arg via a global variable (cleanup_stack), any function we call
 in another translation unit could decide to call that function and it
 would need to see up-to-date values of everything cb_arg points to.
 So before calling such a function it had better store that data to
 memory, not just leave it lying around in a register somewhere.

Given that we, at the moment at least, throw ERRORs from signal
handlers, I don't think that necessarily holds true. I think we're not
that far away from getting rid of all of those though.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes superuser review

* Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Jan 21, 2015 at 11:27 AM, Adam Brightwell
 adam.brightw...@crunchydatasolutions.com wrote:
  After re-reading through this thread is seems like EXCLUSIVEBACKUP (proposed
  by Magnus) seemed to be a potentially acceptable alternative.
 
 So this would let you do pg_start_backup() and pg_stop_backup(), but
 it wouldn't let you run pg_basebackup against the server?

Right.  We already have a role attribute which allows pg_basebackup
(replication).  Also, with pg_basebackup / rolreplication, your role
is able to read the entire data directory from the server, that's not
the case with only rights to run pg_start/stop_backup.

In conjunction with enterprise backup solutions and SANs, which offer
similar controls where a generally unprivileged user can have a snapshot
of the system taken through the SAN interface, you can give users the
ability to run ad-hoc backups of the cluster without giving them
superuser-level access or replication-level access.

Even with simpler solutions, it means that the backup user doesn't
have to be able to run some superuser-level script against the database
to run the backup.

As for pg_basebackup itself, I agree that it's not exactly intuitive
that 'replication' is what grants you the right to run pg_basebackup..
Perhaps we could rename it or make an alias for it, or something along
those lines?  I wasn't looking to do anything with it at this time, but
it would probably be good to improve it somehow, if you (or anyone) have
suggestions on how best to do so.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote:
 Well, if it's essentially unusable, then we've reached parity with the
 old app (yes, you deserved that).

No, I didn't.  What we had before I wrote that tool was a bunch of
wiki pages you put together which were forever having problems with
multiple people editing the page at the same time, and not always
adhering to the formatting standards, and sometimes accidentally or
purposefully deleting things other people had done.  The tool was
written to mimic that, but without the edit-war chaos.  So, if sucked,
it sucked because it mimicked something designed by you.  Furthermore,
if it sucked so bad, why did it take anyone 5 years to get around to
rewriting it?  It took me less than a year to get around to replacing
what you wrote.

 The difference between the old and new apps is that it's actually
 *possible* to improve things on the new app, which was never going to
 happen on the old one.

That is probably a fair critique.

 There's also a significant advantage in knowing
 that the entire email thread is available on a patch without someone
 having to remember to manually paste in each message ID, something which
 failed to happen at least 1/3 of the time for important messages.

As far as I can see, the new app offers no real advantage in this
regard.  The thing that really made things better here is the work
that was done to make threading in our archives work properly.  In the
old app, you could click on the last message for which someone put the
message-ID and then go to the end of the thread.  In the new app, you
can... pretty much do the same thing.  It's not like every message in
the thread shows up in the app.  The latest one with an attachment
does, but that's not necessarily the latest patch version.

(While I'm complaining, the links only go to the flat version of the
thread, while I happen to prefer the version that shows one message at
a time with a message-ID selector to switch messages.)

 What would be cool is a way to flag individual messages in the email
 thread as being important.  Will give it some thought and suggest something.

You make that comment as if three people hadn't already +1'd that idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Josh Berkus

On 01/26/2015 01:29 PM, Robert Haas wrote:
  Furthermore,
 if it sucked so bad, why did it take anyone 5 years to get around to
 rewriting it?  It took me less than a year to get around to replacing
 what you wrote.

Because whoever replaced it knew they'd be facing a shitstorm of criticism?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: row_to_array function


On 1/25/15 4:23 AM, Pavel Stehule wrote:


I tested a concept iteration over array in format [key1, value1, key2, value2, 
.. ] - what is nice, it works for [[key1,value1],[key2, value2], ...] too

It is only a few lines more to current code, and this change doesn't break a 
compatibility.

Do you think, so this patch is acceptable?

Ideas, comments?


Aside from fixing the comments... I think this needs more tests on corner 
cases. For example, what happens when you do

foreach a, b, c in array(array(1,2),array(3,4)) ?

Or the opposite case of

foreach a,b in array(array(1,2,3))

Also, what about:

foreach a,b in '{{{1,2},{3,4}},{{5,6},{7,8}}}'::int[] ?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Magnus Hagander

On Mon, Jan 26, 2015 at 9:46 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jan 26, 2015 at 3:24 PM, Magnus Hagander mag...@hagander.net
 wrote:
  Yes, and the agreement after that feedback was to try it out and then
 figure
  out what changes were needed? As about half the feedback said it was
 better
  without and half said it was better with.

 Well, I can't speak to anyone else's opinion, but I'm quite sure I
 raised the issue that we need a way to call out which messages in the
 thread are important, and I think that's pretty much what Peter is
 saying, too.  I find the new tool essentially unusable - having one
 link to the whole thread instead of individual links to just the
 important messages in that thread is a huge regression for me, as is
 the lack of the most recent activity on the summary page.  I don't
 know how much more feedback than that you need to be convinced, but
 I'm going to shut up now before I say something I will later regret.



According to my mailbox, you didn't even respond on that thread. But it may
well be that your email ended up on some other thread and therefor was not
included when I went back and looked over all the responses I got on it. If
that was the case, then I apologize for loosing track of the feedback.

The most recent activity on the summary page is on my short-term todo
list to fix. The past couple of days have been a bit too busy to get that
done though, mainly due to the upcoming FOSDEM and pgday events. But rest
assured that part is definitely on the list, as it doesn't actually change
any functionality, it's just a view. Same as that quick stats numbers
thing on the frontpage of each cf.

As for being able to flag more things on individual emails/patches, I am
definitely not against that in principle, if that's what people prefer. But
I don't think it's unreasonable to give it a few days and then collect
feedback on that (and other things).

Which of course also includes rolling back the whole thing if people prefer
the older one - that has been an option on the table from the time we
decided to give this one a try in the first place. (Though in that case, we
really need to find a maintainer for that code, as it's we don't seem to
have that now. But I'm sure that can get sorted)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL


On 1/23/15 2:15 PM, Stephen Frost wrote:

 I happen to like the idea specifically because it would allow regular
 roles to change the auditing settings (no need to be a superuser or to
 be able to modify postgresql.conf/postgresql.auto.conf)


Is there really a use case for non-superusers to be able to change auditing 
config? That seems like a bad idea.

What's a bad idea is having every auditor on the system running around
as superuser..


When it comes to looking at auditing data, I agree.

When it comes to changing auditing settings, I think that needs to be very 
restrictive. Really, it should be more (or differently) restrictive than SU, so 
that you can effectively audit your superusers with minimal worries about 
superusers tampering with auditing.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On 2015-01-26 13:32:51 -0800, Josh Berkus wrote:
 On 01/26/2015 01:29 PM, Robert Haas wrote:
   Furthermore,
  if it sucked so bad, why did it take anyone 5 years to get around to
  rewriting it?  It took me less than a year to get around to replacing
  what you wrote.
 
 Because whoever replaced it knew they'd be facing a shitstorm of criticism?

Oh, comeon. Grow up.

A missing feature that several people commented on *before* the tool was
released, and that several people have commented upon since isn't a
shitstorm of criticism.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

On Fri, Jan 23, 2015 at 1:16 PM, Stephen Frost sfr...@snowman.net wrote:
 Well, I'm still of the view that there's little to lose by having this
 remain out-of-core for a release or three.  We don't really all agree
 on what we want, and non-core code can evolve a lot faster than core
 code, so what's the rush?

 Being out of core makes it unusable in production environments for a
 large number of organizations. :/

Tough.  Once we accept something into core, we're stuck maintaining it
forever.  We shouldn't do that unless we're absolutely confident the
design is solid. We are not close to having consensus on this.  If
somebody has a reason for accepting only core PostgreSQL and not
add-ons, it's either a stupid reason, or it's because they believe
that the core stuff will be better thought-out than the add-ons.  If
it's the latter, we shouldn't disappoint.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com wrote:
 Well, I can't speak to anyone else's opinion, but I'm quite sure I
 raised the issue that we need a way to call out which messages in the
 thread are important, and I think that's pretty much what Peter is
 saying, too.

It is.

 I find the new tool essentially unusable - having one
 link to the whole thread instead of individual links to just the
 important messages in that thread is a huge regression for me, as is
 the lack of the most recent activity on the summary page.

Yes, the lack of the most recent activity is also a major omission.

I like some things about the new app. The general idea of having the
mailing list traffic drive the commitfest app is a good one. However,
it seems we've gone too far. I strongly feel that the two regressions
called out here are big problems.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 4:07 PM, Magnus Hagander mag...@hagander.net wrote:
 I assume what was referred to was that the old cf app would show the  last 3
 (I think it was) comments/patches/whatnot on a patch on the summary page
 (and then clickthrough for more details).

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Josh Berkus

On 01/26/2015 12:46 PM, Robert Haas wrote:
 I find the new tool essentially unusable - having one
 link to the whole thread instead of individual links to just the
 important messages in that thread is a huge regression for me, as is
 the lack of the most recent activity on the summary page.

Well, if it's essentially unusable, then we've reached parity with the
old app (yes, you deserved that).

The difference between the old and new apps is that it's actually
*possible* to improve things on the new app, which was never going to
happen on the old one.  There's also a significant advantage in knowing
that the entire email thread is available on a patch without someone
having to remember to manually paste in each message ID, something which
failed to happen at least 1/3 of the time for important messages.

What would be cool is a way to flag individual messages in the email
thread as being important.  Will give it some thought and suggest something.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 4:01 PM, Magnus Hagander mag...@hagander.net wrote:
 According to my mailbox, you didn't even respond on that thread. But it may
 well be that your email ended up on some other thread and therefor was not
 included when I went back and looked over all the responses I got on it. If
 that was the case, then I apologize for loosing track of the feedback.

I remember bringing it up at PGCon, I think.  I don't know whether I
wrote it in an email.

 The most recent activity on the summary page is on my short-term todo list
 to fix. The past couple of days have been a bit too busy to get that done
 though, mainly due to the upcoming FOSDEM and pgday events. But rest assured
 that part is definitely on the list, as it doesn't actually change any
 functionality, it's just a view. Same as that quick stats numbers thing on
 the frontpage of each cf.

OK.  What I'd like to understand is why this new app had to be rolled
out before these things were done.  We've been using my app for 5
years and it, like, worked.  So why the rush to roll it out with these
known issues unfixed?  I mean, it's not like you couldn't look at any
time and see what the new app lacked that the old app had.  The last
time you presented this app for feedback, which I remember to be
PGCon, it was so buggy that there was no point in trying to form any
considered opinion of it.  Now, it's rolled out, but with a bunch of
stuff that people use and rely on missing.  I grant that some of those
things you may not have realized anyone cared about, but it feels to
me like this got pushed into production without really getting it to
feature parity.  I could've spent more of my time complaining about
that than I did, but why should I have to do that?

 As for being able to flag more things on individual emails/patches, I am
 definitely not against that in principle, if that's what people prefer. But
 I don't think it's unreasonable to give it a few days and then collect
 feedback on that (and other things).

Suit yourself.

 Which of course also includes rolling back the whole thing if people prefer
 the older one - that has been an option on the table from the time we
 decided to give this one a try in the first place. (Though in that case, we
 really need to find a maintainer for that code, as it's we don't seem to
 have that now. But I'm sure that can get sorted)

I understand that having someone who has the time to maintain the code
is an important issue, and I admit I don't, and haven't for a while.
There is a lot of sense in replacing the app with something that uses
the same software framework as the rest of our infrastructure, which I
understand that yours does.  And it's not like what the new one does
is horribly different from what the old one did.  So I don't see that
there's any reason we can't make the new one work.  But I confess to
having no patience with the idea that I have to build a consensus to
get you to re-add features that you removed.  You can take that
position, and since you are the tool maintainer it's not exactly
unfair, but since I was willing to put in the work to add those
features in the first place, I probably think they were worth having.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

Hi,

dbase_redo does:
if (InHotStandby)
{
/*
 * Lock database while we resolve conflicts to ensure 
that
 * InitPostgres() cannot fully re-execute concurrently. 
This
 * avoids backends re-connecting automatically to same 
database,
 * which can happen in some cases.
 */
LockSharedObjectForSession(DatabaseRelationId, 
xlrec-db_id, 0, AccessExclusiveLock);
ResolveRecoveryConflictWithDatabase(xlrec-db_id);
}

Unfortunately that Assert()s when there's a lock conflict because
e.g. another backend is currently connecting. That's because ProcSleep()
does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
there's no deadlock timeout (or lock timeout) handler registered.

I'm not sure if this is broken since 8bfd1a884 introducing those session
locks, or if it's caused by the new timeout infrastructure
(f34c68f09f34c68f09).

I guess the easiest way to fix this would be to make this a loop like
ResolveRecoveryConfictWithLock():


LOCKTAG tag;

SET_LOCKTAG_OBJECT(tag,
InvalidOid,
DatabaseRelationId,
xlrec-db_id,
objsubid);

while (!lock_acquired)
{
while (CountDBBackends(dbid)  0)
{
CancelDBBackends(dbid, PROCSIG_RECOVERY_CONFLICT_DATABASE, true);

/*
 * Wait awhile for them to die so that we avoid flooding an
 * unresponsive backend when system is heavily loaded.
 */
pg_usleep(1);
}


if (LockAcquireExtended(locktag, AccessExclusiveLock, true, true, false)
!= LOCKACQUIRE_NOT_AVAIL)
lock_acquired = true;
}

afaics, that should work? Not pretty, but probably easier than starting
to reason about the deadlock detector in the startup process.

We probably should also add a Assert(!InRecovery || sessionLock) to
LockAcquireExtended() - these kind of problems are otherwise hard to
find in a developer setting.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Abbreviated keys for Numeric (was: Re: B-Tree support function number 3 (strxfrm() optimization))

On Mon, Jan 26, 2015 at 8:43 AM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Another spinoff from the abbreviation discussion. Peter Geoghegan
 suggested on IRC that numeric would benefit from abbreviation, and
 indeed it does (in some cases by a factor of about 6-7x or more, because
 numeric comparison is no speed demon).

Cool.

What I find particularly interesting about this patch is that it makes
sorting numerics significantly faster than even sorting float8 values,
at least some of the time, even though the latter has generic
SortSupport (for fmgr elision). Example:

postgres=# create table foo as select x::float8 x, x::numeric y from
(select random() * 1000 x from generate_series(1,100) a) b;
SELECT 100

This query takes about 525ms after repeated executions:  select *
from (select * from foo order by x offset 10) i;

However, this query takes about 412ms:
select * from (select * from foo order by y offset 10) i;

There is probably a good case to be made for float8 abbreviation
supportjust as well that your datum abbreviation patch doesn't
imply that pass-by-value types cannot be abbreviated across the board
(it only implies that abbreviation of pass-by-value types is not
supported in the datum sort case).:-)

Anyway, the second query above (the one with the numeric ORDER BY
column) is enormously faster than the same query executed against
master's tip. That takes about 1720ms following repeated executions.
So at least that case is over 4x faster, suggesting that abbreviation
support for numeric is well worthwhile. So I'm signed up to review
this one too.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan


On 1/23/15 10:16 PM, Amit Kapila wrote:

Further, if we want to just get the benefit of parallel I/O, then
I think we can get that by parallelising partition scan where different
table partitions reside on different disk partitions, however that is
a matter of separate patch.


I don't think we even have to go that far.

My experience with Postgres is that it is *very* sensitive to IO latency (not 
bandwidth). I believe this is the case because complex queries tend to 
interleave CPU intensive code in-between IO requests. So we see this pattern:

Wait 5ms on IO
Compute for a few ms
Wait 5ms on IO
Compute for a few ms
...

We blindly assume that the kernel will magically do read-ahead for us, but I've 
never seen that work so great. It certainly falls apart on something like an 
index scan.

If we could instead do this:

Wait for first IO, issue second IO request
Compute
Already have second IO request, issue third
...

We'd be a lot less sensitive to IO latency.

I wonder what kind of gains we would see if every SeqScan in a query spawned a 
worker just to read tuples and shove them in a queue (or shove a pointer to a 
buffer in the queue). Similarly, have IndexScans have one worker reading the 
index and another worker taking index tuples and reading heap tuples...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: searching in array function - array_position


On 1/24/15 2:48 AM, Pavel Stehule wrote:

with array_offsets - returns a array of offsets


+ entryreturns a offset of first occurrence of some element in a 
array. It uses
should be
+ entryreturns the offset of the first occurrence of some element in 
an array. It uses

+ entryreturns a array of offset of all occurrences some element in a 
array. It uses
should be
+ entryreturns an array of the offsets of all occurrences of some 
element in an array. It uses

Any way to reduce the code duplication between the array and non-array 
versions? Maybe factor out the operator caching code?

You should remove the array_length() from the last array_offsets test; I don't 
see that it buys anything.

I think there should be tests for what happens when you feed these functions a 
multi-dimensional array.

Other than that, looks good.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

On 1/23/15 12:40 PM, Stephen Frost wrote:

That said, the whole timestamp race condition in rsync gives me the
heebie-jeebies. For normal workloads maybe it's not that big a deal, but when
dealing with fixed-size data (ie: Postgres blocks)? Eww.

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

Yeah, I'm more concerned about people that use rsync to take base backups. Do
we need to explicitly advise against that? Is there a way to work around this
with a sleep after pg_start_backup to make sure all timestamps must be
different? (Admittedly I haven't fully wrapped my head around this yet.)

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers?
Could we have it create a shell script instead of directly modifying things itself? Or
perhaps some custom command file that could then be replayed by pg_upgrade on
another server? Of course, that's assuming that replicas are compatible enough with masters
for that to work...

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.

Yeah, if everything is shut down maybe we're OK.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Partitioning: issues/ideas (Was: Re: [HACKERS] On partitioning)


On 1/25/15 7:42 PM, Amit Langote wrote:

On 21-01-2015 PM 07:26, Amit Langote wrote:

Ok, I will limit myself to focusing on following things at the moment:

* Provide syntax in CREATE TABLE to declare partition key


While working on this, I stumbled upon the question of how we deal with
any index definitions following from constraints defined in a CREATE
statement. I think we do not want to have a physical index created for a
table that is partitioned (in other words, has no heap of itself). As
the current mechanisms dictate, constraints like PRIMARY KEY, UNIQUE,
EXCLUSION CONSTRAINT are enforced as indexes. It seems there are really
two decisions to make here:

1) how do we deal with any index definitions (either explicit or
implicit following from constraints defined on it) - do we allow them by
marking them specially, say, in pg_index, as being mere
placeholders/templates or invent some other mechanism?

2) As a short-term solution, do we simply reject creating any indexes
(/any constraints that require them) on a table whose definition also
includes PARTITION ON clause? Instead define them on its partitions (or
any relations in hierarchy that are not further partitioned).

Or maybe I'm missing something...


Wasn't the idea that the parent table in a partitioned table wouldn't actually 
have a heap of it's own? If there's no heap there can't be an index.

That said, I think this is premature optimization that could be done later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context


On 1/26/15 9:46 AM, Pavel Stehule wrote:


The design I thought had been agreed on was to add some new option to
plpgsql's RAISE command which would cause suppression of all CONTEXT lines
not just the most closely nested one.  You could argue about whether the
behavior needs to be 100% backwards compatible or not --- if so, perhaps
it could be a three-way option all, none, or one line, defaulting to the
last for backwards compatibility.


  I see a problem what should be default behave. When I raise NOTICE, then I 
don't need (don't would) to see CONTEXT lines, When I raise EXCEPTION, then I 
usually would to see CONTEXT lines.


FWIW, that's the case I almost always run into: I turn on some debugging which 
means I know where the RAISE is coming from, but now I'm flooded with CONTEXT 
lines. You could do that with an option to RAISE, but that seems like a lot of 
extra coding work for little gain. Perhaps it'd be worth creating 
client_min_context and log_min_context GUCs...

Another option that I think would work well is that you only provide context for the 
first call within a block of code. For plpgsql that would be a function, but 
maybe it'd be better to just do this per-subtransaction.

I do agree that this needs to work across the board, not just for plpgsql.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

Hi,

On 2015-01-22 19:56:07 +0100, Andres Freund wrote:
 Hi,
 
 On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
  I'm analyzing a problem in which a customer had a pg_basebackup (from
  standby) created 9.2 cluster that failed with WAL contains references to
  invalid pages. The failed record was a xlog redo visible
  i.e. XLOG_HEAP2_VISIBLE.
 
  First I thought there might be another bug along the line of
  17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
  case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
  seem to have any problems.
 
  Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
  the basebackup was started and finished *before* pg_basebackup finished.
 
  movedb() basically works in these steps:
  1) lock out users of the database
  2) RequestCheckpoint(IMMEDIATE|WAIT)
  3) DropDatabaseBuffers()
  4) copydir()
  5) XLogInsert(XLOG_DBASE_CREATE)
  6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
  7) rmtree(src_dbpath)
  8) XLogInsert(XLOG_DBASE_DROP)
  9) unlock database
 
  If a basebackup starts while 4) is in progress and continues until 7)
  happens I think a pretty wide race opens: The basebackup can end up with
  a partial copy of the database in the old tablespace because the
  rmtree(old_path) concurrently was in progress.  Normally such races are
  fixed during replay. But in this case, the replay of the
  XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
  fixing nothing.
 
  Besides making AD .. ST use sane WAL logging, which doesn't seem
  backpatchable, I don't see what could be done against this except
  somehow making basebackups fail if a AD .. ST is in progress. Which
  doesn't look entirely trivial either.
 
 I basically have two ideas to fix this.
 
 1) Make do_pg_start_backup() acquire a SHARE lock on
pg_database. That'll prevent it from starting while a movedb() is
still in progress. Then additionally add pg_backup_in_progress()
function to xlog.c that checks (XLogCtl-Insert.exclusiveBackup ||
XLogCtl-Insert.nonExclusiveBackups != 0). Use that in createdb() and
movedb() to error out if a backup is in progress.

Attached is a patch trying to this. Doesn't look too bad and lead me to
discover missing recovery conflicts during a AD ST.

But: It doesn't actually work on standbys, because lock.c prevents any
stronger lock than RowExclusive from being acquired. And we need need a
lock that can conflict with WAL replay of DBASE_CREATE, to handle base
backups that are executed on the primary. Those obviously can't detect
whether any standby is currently doing a base backup...

I currently don't have a good idea how to mangle lock.c to allow
this. I've played with doing it like in the second patch, but that
doesn't actually work because of some asserts around ProcSleep - leading
to locks on database objects not working in the startup process (despite
already being used).

The easiest thing would be to just use a lwlock instead of a heavyweight
lock - but those aren't canceleable...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On 2015-01-26 12:54:04 -0800, Peter Geoghegan wrote:
 On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com wrote:
  Well, I can't speak to anyone else's opinion, but I'm quite sure I
  raised the issue that we need a way to call out which messages in the
  thread are important, and I think that's pretty much what Peter is
  saying, too.
 
 It is.

Agreed.

  I find the new tool essentially unusable - having one
  link to the whole thread instead of individual links to just the
  important messages in that thread is a huge regression for me, as is
  the lack of the most recent activity on the summary page.
 
 Yes, the lack of the most recent activity is also a major omission.

That one is there now though, isn't it?

For specific CF:
https://commitfest.postgresql.org/3/activity/
All CFs:
https://commitfest.postgresql.org/activity/

Or do you mean something else?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Magnus Hagander

On Mon, Jan 26, 2015 at 10:05 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2015-01-26 12:54:04 -0800, Peter Geoghegan wrote:
  On Mon, Jan 26, 2015 at 12:46 PM, Robert Haas robertmh...@gmail.com
 wrote:
   Well, I can't speak to anyone else's opinion, but I'm quite sure I
   raised the issue that we need a way to call out which messages in the
   thread are important, and I think that's pretty much what Peter is
   saying, too.
 
  It is.

 Agreed.

   I find the new tool essentially unusable - having one
   link to the whole thread instead of individual links to just the
   important messages in that thread is a huge regression for me, as is
   the lack of the most recent activity on the summary page.
 
  Yes, the lack of the most recent activity is also a major omission.

 That one is there now though, isn't it?

 For specific CF:
 https://commitfest.postgresql.org/3/activity/
 All CFs:
 https://commitfest.postgresql.org/activity/

 Or do you mean something else?


I assume what was referred to was that the old cf app would show the  last
3 (I think it was) comments/patches/whatnot on a patch on the summary page
(and then clickthrough for more details).


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] proposal: plpgsql - Assert statement

2015-01-26 22:34 GMT+01:00 Jim Nasby jim.na...@bluetreble.com:

 On 1/22/15 2:01 PM, Pavel Stehule wrote:


 * I would to simplify a behave of evaluating of message
 expression - probably I disallow NULL there.


 Well, the only thing I could see you doing there is throwing a
 different error if the hint is null. I don't see that as an improvement.
 I'd just leave it as-is.


 I enabled a NULL - but enforced a WARNING before.


 I don't see the separate warning as being helpful. I'd just do something
 like

 +(err_hint != NULL) ? errhint(%s,
 err_hint) : errhint(Message attached to failed assertion is null) ));


ok


 There should also be a test case for a NULL message.

  * GUC enable_asserts will be supported


 That would be good. Would that allow for enabling/disabling on a
 per-function basis too?


 sure - there is only question if we develop a #option
 enable|disable_asserts. I have no string idea.


 The option would be nice, but I don't think it's strictly necessary. The
 big thing is being able to control this on a per-function basis (which I
 think you can do with ALTER FUNCTION SET?)


you can do it without any change now





 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 4:32 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/26/2015 01:29 PM, Robert Haas wrote:
  Furthermore,
 if it sucked so bad, why did it take anyone 5 years to get around to
 rewriting it?  It took me less than a year to get around to replacing
 what you wrote.

 Because whoever replaced it knew they'd be facing a shitstorm of criticism?

Didn't stop me.  And actually, I didn't face a shitstorm of criticism.
The way I remember it, I got a pretty much unqualified positive
reaction at the time.  Only later, when you had conveniently forgotten
how bad things were before we had that tool, did you start routinely
dumping on it.  And, AFAICS, not because of anything it did wrong,
only because of things that it didn't do that were features you wanted
to have.

Most of those features were things that I could not possibly have
implemented anyway because they would have required deeper integration
with the authentication system than was possible with the access my
app had.  Automatically linking new emails?  Not possible: I had no
archives access.  Automatic emails to users?  Not possible: no access
to email addresses.  Place to put a link to a git branch?  Yeah, I
could have done that, and just didn't, but the new app doesn't do it
either, yet anyway.  I don't know how Magnus's app is connecting in to
the rest of the PG infrastructure, but it's obviously got more access
than mine had.  And I think that's great, because hopefully it will
eventually make this much nicer than what I had.  It isn't nicer yet,
though, and you showing up and tossing out insults based on
revisionist history won't fix that.

What particularly peeves me about your remarks is that you've
basically made no contribution to the CommitFest process in years.  I
am less active than I used to be, but I still do a lot of reviewing
and committing of patches and generally put a pretty significant
amount of time into it.  Despite whatever shortcomings that app I
wrote has, so do a lot of other people.  The CommitFest process has
got its issues, particularly a lack of reviewers, but it is still
better than having no process, and yet your only contributions seem to
be to denigrate the people who are putting time into it.  You're using
the poor quality of my app, and an almost total misinterpretation of
what was said about attracting new reviewers at PGCon, as excuses for
your non-participation, and that is certainly your prerogative.  You
have as much right not to participate as anyone.  But refusing to
participate except to throw bricks is not going to move this project
forward.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

On Mon, Jan 26, 2015 at 8:04 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-01-26 19:58:25 -0500, Bruce Momjian wrote:
 On Tue, Jan 27, 2015 at 01:43:41AM +0100, Andres Freund wrote:
  master + 32align.patch:
  -c max_connections=400
  tps = 183791.872359 (high run vs. run variability)
  -c max_connections=401
  tps = 185494.98244 (high run vs. run variability)
 
  master + 64align.patch:
  -c max_connections=400
  tps = 489257.195570
  -c max_connections=401
  tps = 490496.520632
 
  Pretty much as expected, rigth?

 Yes, I am convinced.  Let's work on a patch now.

 Since I can reproduce some minor (1-3%) performance *regressions* at low
 client counts when aligning every shmem allocation, I'm inclined to just
 add special case code to BufferShmemSize()/InitBufferPool() to align
 descriptors to PG_CACHE_LINE_SIZE. That's really unlikely to regress
 anythign as it basically can't be a bad idea to align buffer
 descriptors.

 Contrary opinions? Robert?

I'm totally OK with further aligning just that one allocation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExplainModifyTarget doesn't work as expected

2015-01-26 Thread Etsuro Fujita


On 2015/01/27 9:15, Jim Nasby wrote:

On 12/22/14 12:50 AM, Etsuro Fujita wrote:

I think ExplainModifyTarget should show the parent of the inheritance
tree in multi-target-table cases, as described there, but noticed that
it doesn't always work like that.  Here is an example.


Anything ever happen with this?


Nothing.  I'll add this to the next CF.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission

2015-01-26 Thread Haribabu Kommi

On Mon, Jun 30, 2014 at 5:06 PM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 I think having two columns would work. The columns could be called
 database and database_list and user and user_list respectively.

 The database column may contain one of all, sameuser, samegroup,
 replication, but if it's empty, database_list will contain an array of
 database names. Then (all, {}) and (, {all}) are easily separated.
 Likewise for user and user_list.

Thanks for the review.

I corrected all the review comments except the one to add two columns
as (database, database_list and user, user_list). I feel this may cause
some confusion to the users.

Here I attached the latest version of the patch.
I will add this patch to the next commitfest.

Regards,
Hari Babu
Fujitsu Australia


Catalog_view_to_HBA_settings_patch_V3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

On Mon, Jan 26, 2015 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote:
 Contrary opinions? Robert?

 I'm totally OK with further aligning just that one allocation.

Of course, now that I think about it, aligning it probably works
mostly because the size is almost exactly one cache line.  If it were
any bigger or smaller, aligning it more wouldn't help.  So maybe we
should also do something like what LWLocks do, and make a union
between the actual structure and an appropriate array of padding bytes
- say either 64 or 128 of them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: dynahash replacement for buffer table

This developed a slight merge conflict.  I've rebased the attached
version, and I also took the step of getting rid of buf_table.c, as I
think I proposed somewhere upthread.  This avoids the overhead of
constructing a BufferTag only to copy it into a BufferLookupEnt, plus
some function calls and so forth.  A quick-and-dirty test suggests
this might not have cut down on the 1-client overhead much, but I
think it's worth doing anyway: it's certainly saving a few cycles, and
I don't think it's complicating anything measurably.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/Makefile b/contrib/Makefile
index 195d447..141ef0a 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -19,6 +19,7 @@ SUBDIRS = \
 		earthdistance	\
 		file_fdw	\
 		fuzzystrmatch	\
+		hashtest	\
 		hstore		\
 		intagg		\
 		intarray	\
diff --git a/contrib/hashtest/Makefile b/contrib/hashtest/Makefile
new file mode 100644
index 000..3ee42f8
--- /dev/null
+++ b/contrib/hashtest/Makefile
@@ -0,0 +1,18 @@
+# contrib/hashtest/Makefile
+
+MODULE_big = hashtest
+OBJS = hashtest.o
+
+EXTENSION = hashtest
+DATA = hashtest--1.0.sql
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/hashtest
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/hashtest/hashtest--1.0.sql b/contrib/hashtest/hashtest--1.0.sql
new file mode 100644
index 000..e271baf
--- /dev/null
+++ b/contrib/hashtest/hashtest--1.0.sql
@@ -0,0 +1,52 @@
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use CREATE EXTENSION hashtest to load this file. \quit
+
+CREATE FUNCTION chash_insert_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'chash_insert_test'
+LANGUAGE C;
+
+CREATE FUNCTION chash_search_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'chash_search_test'
+LANGUAGE C;
+
+CREATE FUNCTION chash_delete_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'chash_delete_test'
+LANGUAGE C;
+
+CREATE FUNCTION chash_concurrent_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'chash_concurrent_test'
+LANGUAGE C;
+
+CREATE FUNCTION chash_collision_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'chash_collision_test'
+LANGUAGE C;
+
+CREATE FUNCTION dynahash_insert_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'dynahash_insert_test'
+LANGUAGE C;
+
+CREATE FUNCTION dynahash_search_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'dynahash_search_test'
+LANGUAGE C;
+
+CREATE FUNCTION dynahash_delete_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'dynahash_delete_test'
+LANGUAGE C;
+
+CREATE FUNCTION dynahash_concurrent_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'dynahash_concurrent_test'
+LANGUAGE C;
+
+CREATE FUNCTION dynahash_collision_test()
+RETURNS void
+AS 'MODULE_PATHNAME', 'dynahash_collision_test'
+LANGUAGE C;
diff --git a/contrib/hashtest/hashtest.c b/contrib/hashtest/hashtest.c
new file mode 100644
index 000..172a5bb
--- /dev/null
+++ b/contrib/hashtest/hashtest.c
@@ -0,0 +1,527 @@
+/*-
+ * hashtest.c
+ *-
+ */
+
+#include postgres.h
+
+#include funcapi.h
+#include libpq/auth.h
+#include lib/stringinfo.h
+#include miscadmin.h
+#include portability/instr_time.h
+#include storage/ipc.h
+#include utils/chash.h
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+Datum		chash_insert_test(PG_FUNCTION_ARGS);
+Datum		chash_search_test(PG_FUNCTION_ARGS);
+Datum		chash_delete_test(PG_FUNCTION_ARGS);
+Datum		chash_concurrent_test(PG_FUNCTION_ARGS);
+Datum		chash_collision_test(PG_FUNCTION_ARGS);
+Datum		dynahash_insert_test(PG_FUNCTION_ARGS);
+Datum		dynahash_search_test(PG_FUNCTION_ARGS);
+Datum		dynahash_delete_test(PG_FUNCTION_ARGS);
+Datum		dynahash_concurrent_test(PG_FUNCTION_ARGS);
+Datum		dynahash_collision_test(PG_FUNCTION_ARGS);
+static void hashtest_shmem_startup(void);
+
+PG_FUNCTION_INFO_V1(chash_insert_test);
+PG_FUNCTION_INFO_V1(chash_search_test);
+PG_FUNCTION_INFO_V1(chash_delete_test);
+PG_FUNCTION_INFO_V1(chash_concurrent_test);
+PG_FUNCTION_INFO_V1(chash_collision_test);
+PG_FUNCTION_INFO_V1(dynahash_insert_test);
+PG_FUNCTION_INFO_V1(dynahash_search_test);
+PG_FUNCTION_INFO_V1(dynahash_delete_test);
+PG_FUNCTION_INFO_V1(dynahash_concurrent_test);
+PG_FUNCTION_INFO_V1(dynahash_collision_test);
+
+typedef struct
+{
+	uint32	key;
+	uint32	val;
+} hentry;
+
+static CHashDescriptor cdesc = {
+	hashtest-chash,	/* name */
+	1048576,			/* capacity */
+	sizeof(hentry),		/* element size */
+	sizeof(uint32)		/* key size */
+};
+
+#define DYNAHASH_PARTITIONS		16
+
+static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
+static CHashTable chash;
+static HTAB *dynahash;
+static LWLockId dynahash_lock[DYNAHASH_PARTITIONS];
+static ClientAuthentication_hook_type original_client_auth_hook = NULL;
+
+static

Re: [HACKERS] Hot Standby WAL reply uses heavyweight session locks, but doesn't have enough infrastructure set up

2015-01-26 Thread Michael Paquier

On Tue, Jan 27, 2015 at 6:24 AM, Andres Freund and...@2ndquadrant.com wrote:
 Unfortunately that Assert()s when there's a lock conflict because
 e.g. another backend is currently connecting. That's because ProcSleep()
 does a enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout) - and
 there's no deadlock timeout (or lock timeout) handler registered.
Yes, that could logically happen if there is a lock conflicting as
RowExclusiveLock or lower lock can be taken in recovery.

 [...]
 afaics, that should work? Not pretty, but probably easier than starting
 to reason about the deadlock detector in the startup process.
Wouldn't it be cleaner to simply register a dedicated handler in
StartupXlog instead, obviously something else than DEADLOCK_TIMEOUT as
it is reserved for backend operations? For back-branches, we may even
consider using DEADLOCK_TIMEOUT..

 We probably should also add a Assert(!InRecovery || sessionLock) to
 LockAcquireExtended() - these kind of problems are otherwise hard to
 find in a developer setting.
So this means that locks other than session ones cannot be taken while
a node is in recovery, but RowExclusiveLock can be taken while in
recovery. Don't we have a problem with this assertion then?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

On 2015-01-26 22:03:03 +0100, Andres Freund wrote:
 Attached is a patch trying to this. Doesn't look too bad and lead me to
 discover missing recovery conflicts during a AD ST.
 
 But: It doesn't actually work on standbys, because lock.c prevents any
 stronger lock than RowExclusive from being acquired. And we need need a
 lock that can conflict with WAL replay of DBASE_CREATE, to handle base
 backups that are executed on the primary. Those obviously can't detect
 whether any standby is currently doing a base backup...
 
 I currently don't have a good idea how to mangle lock.c to allow
 this. I've played with doing it like in the second patch, but that
 doesn't actually work because of some asserts around ProcSleep - leading
 to locks on database objects not working in the startup process (despite
 already being used).
 
 The easiest thing would be to just use a lwlock instead of a heavyweight
 lock - but those aren't canceleable...

As Stephen noticed on irc I forgot to attach those. Caused by the
generic HS issue mentioned in 20150126212458.ga29...@awork2.anarazel.de.

Attached now.

As mentioned above, this isn't really isn't ready (e.g. it'll Assert() on a
standby due to the HS issue).

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e5d46c73955e7647912c2625e31027a6fef7c880 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 26 Jan 2015 21:03:35 +0100
Subject: [PATCH] Prevent ALTER DATABASE SET TABLESPACE from running
 concurrently with a base backup.

The combination can cause a corrupt base backup.
---
 src/backend/access/transam/xlog.c| 33 +
 src/backend/commands/dbcommands.c| 41 
 src/backend/replication/basebackup.c | 11 ++
 src/include/access/xlog.h|  1 +
 4 files changed, 86 insertions(+)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 629a457..1daa10d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -53,6 +53,7 @@
 #include storage/ipc.h
 #include storage/large_object.h
 #include storage/latch.h
+#include storage/lmgr.h
 #include storage/pmsignal.h
 #include storage/predicate.h
 #include storage/proc.h
@@ -9329,6 +9330,14 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	else
 		XLogCtl-Insert.nonExclusiveBackups++;
 	XLogCtl-Insert.forcePageWrites = true;
+
+	/*
+	 * Acquire lock on pg datababase just before releasing the WAL insert lock
+	 * and entering the ENSURE_ERROR_CLEANUP block. That way it's sufficient
+	 * to release it in the error cleanup callback.
+	 */
+	LockRelationOid(DatabaseRelationId, ShareLock);
+
 	WALInsertLockRelease();
 
 	/* Ensure we release forcePageWrites if fail below */
@@ -9523,6 +9532,8 @@ do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p,
 	}
 	PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 
+	UnlockRelationOid(DatabaseRelationId, ShareLock);
+
 	/*
 	 * We're done.  As a convenience, return the starting WAL location.
 	 */
@@ -9537,6 +9548,8 @@ pg_start_backup_callback(int code, Datum arg)
 {
 	bool		exclusive = DatumGetBool(arg);
 
+	UnlockRelationOid(DatabaseRelationId, ShareLock);
+
 	/* Update backup counters and forcePageWrites on failure */
 	WALInsertLockAcquireExclusive();
 	if (exclusive)
@@ -9937,6 +9950,26 @@ do_pg_abort_backup(void)
 }
 
 /*
+ * Is a (exclusive or nonexclusive) base backup running?
+ *
+ * Note that this does not check whether any standby of this node has a
+ * basebackup running, or whether any upstream master (if this is a standby)
+ * has one in progress
+ */
+bool
+LocalBaseBackupInProgress(void)
+{
+	bool ret;
+
+	WALInsertLockAcquire();
+	ret = XLogCtl-Insert.exclusiveBackup ||
+		XLogCtl-Insert.nonExclusiveBackups  0;
+	WALInsertLockRelease();
+
+	return ret;
+}
+
+/*
  * Get latest redo apply position.
  *
  * Exported to allow WALReceiver to read the pointer directly.
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 5e66961..6184c83 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1087,6 +1087,30 @@ movedb(const char *dbname, const char *tblspcname)
  errmsg(cannot change the tablespace of the currently open database)));
 
 	/*
+	 * Prevent SET TABLESPACE from running concurrently with a base
+	 * backup. Without that check a base backup would potentially copy a
+	 * partially removed source database; which WAL replay then would copy
+	 * over the new database...
+	 *
+	 * Starting a base backup takes a SHARE lock on pg_database. In addition a
+	 * streaming basebackup takes the same lock for the entirety of the copy
+	 * of the data directory.  That, combined with this check, prevents base
+	 * backups from being taken at the same

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-26 Thread Dilip kumar

On 23 January 2015 21:10, Alvaro Herrera Wrote,


 In case you're up for doing some more work later on, there are two
 ideas
 here: move the backend's TranslateSocketError to src/common, and try to
 merge pg_dump's select_loop function with the one in this new code.
 But that's for another patch anyway (and this new function needs a
 little battle-testing, of course.)
 
Thanks for your effort, I will take it up for next commitfest..


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows buildfarm animals are still not happy with abbreviated keys patch

2015-01-26 Thread Michael Paquier

On Tue, Jan 27, 2015 at 4:21 AM, Robert Haas robertmh...@gmail.com wrote:
 That's what I hope to find out.  :-)
Buildfarm seems happy now. I just gave a try to that on one of my
small Windows VMs and compared the performance with 9.4 for this
simple test case when building with MSVC 2010:
create table aa as select random()::text as a, 'filler filler filler'
as b a from generate_series(1,100);
create index aai in aa(a):
On 9.4, the index creation took 26.5s, while on master it took 18s.
That's nice, particularly for things like a restore from a dump.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-26 Thread Amit Kapila

On Tue, Jan 27, 2015 at 3:18 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 On 1/23/15 10:16 PM, Amit Kapila wrote:

 Further, if we want to just get the benefit of parallel I/O, then
 I think we can get that by parallelising partition scan where different
 table partitions reside on different disk partitions, however that is
 a matter of separate patch.


 I don't think we even have to go that far.


 We'd be a lot less sensitive to IO latency.

 I wonder what kind of gains we would see if every SeqScan in a query
spawned a worker just to read tuples and shove them in a queue (or shove a
pointer to a buffer in the queue).


Here IIUC, you want to say that just get the read done by one parallel
worker and then all expression calculation (evaluation of qualification
and target list) in the main backend, it seems to me that by doing it
that way, the benefit of parallelisation will be lost due to tuple
communication overhead (may be the overhead is less if we just
pass a pointer to buffer but that will have another kind of problems
like holding buffer pins for a longer period of time).

I could see the advantage of testing on lines as suggested by Tom Lane,
but that seems to be not directly related to what we want to achieve by
this patch (parallel seq scan) or if you think otherwise then let me know?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2015-01-26 Thread Dilip kumar

On 23 January 2015 23:55, Alvaro Herrera,

 -j1 is now the same as not specifying anything, and vacuum_one_database
 uses more common code in the parallel and not-parallel cases: the not-
 parallel case is just the parallel case with a single connection, so
 the setup and shutdown is mostly the same in both cases.
 
 I pushed the result.  Please test, particularly on Windows.  If you can
 use configure --enable-tap-tests and run them (make check in the
 src/bin/scripts subdir) that would be good too .. not sure whether
 that's expected to work on Windows.


I have tested in windows, its working fine,

Not sure how to enable tap test in windows, I will check it and run if possible.


Thanks,
Dilip


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using 128-bit integers for sum, avg and statistics aggregates

2015-01-26 Thread Andreas Karlsson


On 01/23/2015 02:58 AM, Petr Jelinek wrote:

On 23/01/15 00:40, Andreas Karlsson wrote:

- Renamed some things from int12 to int128, there are still some places
with int16 which I am not sure what to do with.


I'd vote for renaming them to int128 too, there is enough C functions
that user int16 for 16bit integer that this is going to be confusing
otherwise.


Do you also think the SQL functions should be named numeric_int128_sum, 
numeric_int128_avg, etc?


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb, unicode escapes and escaped backslashes

2015-01-26 Thread Noah Misch

On Mon, Jan 26, 2015 at 09:20:54AM -0500, Andrew Dunstan wrote:
 On 01/23/2015 02:18 AM, Noah Misch wrote:
 On Wed, Jan 21, 2015 at 06:51:34PM -0500, Andrew Dunstan wrote:
 We could probably fix this fairly easily for non- U+ cases by having
 jsonb_to_cstring use a different escape_json routine.

 Maybe we should detect such input and emit a warning of
 ambiguity? It's likely to be rare enough, but clearly not as rare as we'd
 like, since this is a report from the field.

 Perhaps.  Something like WARNING:  jsonb cannot represent \\u; reading 
 as
 \u?  Alas, but I do prefer that to silent data corruption.
 
 Maybe something like this patch. I have two concerns about it, though. The
 first is the possible performance impact of looking for the offending string
 in every jsonb input, and the second is that the test isn't quite right,
 since input of \\\u doesn't raise this issue - i.e. the problem arises
 when u is preceded by an even number of backslashes.

I share your second concern.  It is important that this warning not cry wolf;
it should never fire for an input that is actually unaffected.

 For the moment, maybe I could commit the fix for the non U+ case for
 escape_json, and we could think some more about detecting and warning about
 the problem strings.

+1 for splitting development that way.  Fixing the use of escape_json() is
objective, but the choices around the warning are more subtle.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers


On 1/26/15 4:51 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

On 1/24/15 3:31 PM, Tom Lane wrote:

Another idea is to teach Valgrind that whenever a backend reduces its
pin count on a shared buffer to zero, that buffer should become undefined
memory.



paranoia



Shouldn't this technically tie in with ResourceOwners?


No.  ResourceOwner is just a mechanism to ensure that we remember to call
UnpinBuffer, it has no impact on what the semantics of the pin count are.
The *instant* the pin count goes to zero, another backend is entitled to
recycle that buffer for some other purpose.


But one backend can effectively pin a buffer more than once, no? If so, then 
ISTM there's some risk that code path A pins and forgets to unpin, but path B 
accidentally unpins for A.

But as you say, this is all academic until the pin count hits 0, so it's 
probably not worth worrying about.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ExplainModifyTarget doesn't work as expected


On 12/22/14 12:50 AM, Etsuro Fujita wrote:

I think ExplainModifyTarget should show the parent of the inheritance
tree in multi-target-table cases, as described there, but noticed that
it doesn't always work like that.  Here is an example.


Anything ever happen with this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)

On 2015-01-26 18:30:13 -0600, Jim Nasby wrote:
 On 12/23/14 11:41 AM, Andres Freund wrote:
  I think it'd generally be useful to have something like errhidecontext()
  akin to errhidestatement() to avoid things like the above.
  
 
 Under this proposal, do you want to suppress the context/statement
 unconditionally or via some hook/variable, because it might be useful to
 print the contexts/statements in certain cases where there is complex
 application and we don't know which part of application code causes
 problem.
 I'm proposing to do model it after errhidestatement(). I.e. add
 errhidecontext().
 
 I've attached what I was tinkering with when I wrote this message.
 
  The usecases wher eI see this as being useful is high volume debug
  logging, not normal messages...
  
 
 I think usecase is valid, it is really difficult to dig such a log
 especially when statement size is big.
 Right, that was what triggered to look me into it. I'd cases where the
 same context was printed thousands of times.
 
 In case anyone picks this up... this problem exists in other places too, such 
 as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context 
 and log_min_context that operate akin to *_min_messages but control context 
 output.

I've pushed it already IIRC. I don't think we can just apply it without
regard for possible users to that many cases - I think the RAISE DEBUG
case is better addressed by a plpgsql option to *optionall* suppress
context, than do it unconditionally. That's discussed somewhere nearby.

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a fast bloat measurement tool (was Re: Measuring relation free space)


On 12/26/14 1:38 AM, Abhijit Menon-Sen wrote:

At 2014-09-25 15:40:11 +0530,a...@2ndquadrant.com  wrote:


All right, then I'll post a version that addresses Amit's other
points, adds a new file/function to pgstattuple, acquires content
locks, and uses HeapTupleSatisfiesVacuum, hint-bit setting and all.

Sorry, I forgot to post this patch. It does what I listed above, and
seems to work fine (it's still quite a lot faster than pgstattuple
in many cases).


Anything happen with this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

On 2015-01-26 19:58:25 -0500, Bruce Momjian wrote:
 On Tue, Jan 27, 2015 at 01:43:41AM +0100, Andres Freund wrote:
  master + 32align.patch:
  -c max_connections=400
  tps = 183791.872359 (high run vs. run variability)
  -c max_connections=401
  tps = 185494.98244 (high run vs. run variability)
  
  master + 64align.patch:
  -c max_connections=400
  tps = 489257.195570
  -c max_connections=401
  tps = 490496.520632
  
  Pretty much as expected, rigth?
 
 Yes, I am convinced.  Let's work on a patch now.

Since I can reproduce some minor (1-3%) performance *regressions* at low
client counts when aligning every shmem allocation, I'm inclined to just
add special case code to BufferShmemSize()/InitBufferPool() to align
descriptors to PG_CACHE_LINE_SIZE. That's really unlikely to regress
anythign as it basically can't be a bad idea to align buffer
descriptors.

Contrary opinions? Robert?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-01-26 Thread Daniel Bausch

Hi PG devs!

Tom Lane t...@sss.pgh.pa.us writes:

 Wait for first IO, issue second IO request
 Compute
 Already have second IO request, issue third
 ...

 We'd be a lot less sensitive to IO latency.

 It would take about five minutes of coding to prove or disprove this:
 stick a PrefetchBuffer call into heapgetpage() to launch a request for the
 next page as soon as we've read the current one, and then see if that
 makes any obvious performance difference.  I'm not convinced that it will,
 but if it did then we could think about how to make it work for real.

Sorry for dropping in so late...

I have done all this two years ago.  For TPC-H Q8, Q9, Q17, Q20, and Q21
I see a speedup of ~100% when using IndexScan prefetching + Nested-Loops
Look-Ahead (the outer loop!).
(On SSD with 32 Pages Prefetch/Look-Ahead + Cold Page Cache / Small RAM)

Regards,
Daniel
-- 
MSc. Daniel Bausch
Research Assistant (Computer Science)
Technische UniversitÃ¤t Darmstadt
http://www.dvs.tu-darmstadt.de/staff/dbausch



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SSL renegotiation and other related woes

Hi,

When working on getting rid of ImmediateInterruptOK I wanted to verify
that ssl still works correctly. Turned out it didn't. But neither did it
in master.

Turns out there's two major things we do wrong:

1) We ignore the rule that once called and returning
   SSL_ERROR_WANTS_(READ|WRITE) SSL_read()/write() have to be called
   again with the same parameters. Unfortunately that rule doesn't mean
   just that the same parameters have to be passed in, but also that we
   can't just constantly switch between _read()/write(). Especially
   nonblocking backend code (i.e. walsender) and the whole frontend code
   violate this rule.

2) We start renegotiations in be_tls_write() while in nonblocking mode,
   but don't properly retry to handle socket readyness. There's a loop
   that retries handshakes twenty times (???), but what actually is
   needed is to call SSL_get_error() and ensure that there's actually
   data available.

2) is easy enough to fix, but 1) is pretty hard. Before anybody says
that 1) isn't an important rule: It reliably causes connection aborts
within a couple renegotiations. The higher the latency the higher the
likelihood of aborts. Even locally it doesn't take very long to
abort. Errors usually are something like SSL connection has been closed
unexpectedly or SSL Error: sslv3 alert unexpected message and a host
of other similar messages. There's a couple reports of those in the
archives and I've seen many more in client logs.

As far as I can see the only realistic way to fix 1) is to change both
frontend and backend code to:
a) Always check for socket read/writeability before calling
   SSL_read/write() when in nonblocking mode. That's a bit annoying
   because it nearly doubles the amount of syscalls we do or client
   communication, but I can't really se an alternative. That allows us
   to avoid waiting inside after a WANT_READ/WRITE, or havin to setup a
   larger state machine that keeps track what we tried last.

b) When SSL_read/write nonetheless returns WANT_READ/WRITE, even though
   we tested for read/writeability, we're very likely doing
   renegotiation. In that case we'll just have to block. There's already
   code that busy loops (and thus waits) in the frontend
   (c.f. pgtls_read's WANT_WRITE case, triggered during reneg). We can't
   just return immediately to the upper layers as we'd otherwise likely
   violate the rule about calling ssl with the same parameters again.

c) Add a somewhat hacky optimization whereas we allow to break out of a
   WANT_READ condition in a nonblocking socket when ssl-state ==
   SSL_ST_OK. That's the cases where it actually, at least by my reading
   of the unreadable ssl code, safe to not wait. That case is somewhat
   important because we otherwise can end up waiting on both sides due
   to b), even when nonblocking calls where actually made.  That
   condition essentially means that we'll only block if renegotiation or
   partial reads are in progress. Afaics at least.

d) Remove the SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER hack - we don't
   actually need it anymore.

These errors are much less frequent when using a plain frontend
(e.g. psql/pgbench) because they don't use copy both stuff - the way
these clients use the FE/BE protocol there's essentially natural
synchronization points where nothing but renegotiation happens. With
walsender (or pipelined queries!) both sides can write at the same time.


My testcase for this is just to setup a server with a low
ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
receive data via pg_receivexlog -n. Usually it'll error out quickly.


I've done a preliminary implementation of the above steps and it
survives transferring 25GB of WAL via the replication protocol with a
ssl_renegotiation_limit=100kB - previously it failed much earlier.


Does anybody have a neater way to tackle this? I'm not happy about this
solution, but I really can't think of anything better (save ditching
openssl maybe).  I'm willing to clean up my hacked up fix for this, but
not if we can't find agreement on the approach.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSL renegotiation and other related woes

On 2015-01-26 11:14:05 +0100, Andres Freund wrote:
 My testcase for this is just to setup a server with a low
 ssl_renegotiation_limit, generate lots of WAL (wal.sql attached) and
 receive data via pg_receivexlog -n. Usually it'll error out quickly.

...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
DROP TABLE IF EXISTS foo_:client_id;
CREATE TABLE foo_:client_id AS SELECT * FROM generate_series(1, 100) a(a), 
generate_series(1, 1) b(b);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

On 2015-01-22 22:58:17 +0100, Andres Freund wrote:
 Because the way it currently works is a major crock. It's more luck than
 anything that it actually somewhat works. We normally rely on WAL to
 bring us into a consistent state. But around CREATE/MOVE/DROP DATABASE
 we've ignored that.

Hah: There's even a comment about some of the existing dangers:
 *
 * In PITR replay, the first of these isn't an issue, and the 
second
 * is only a risk if the CREATE DATABASE and subsequent template
 * database change both occur while a base backup is being 
taken.
 * There doesn't seem to be much we can do about that except 
document
 * it as a limitation.
 *
 * Perhaps if we ever implement CREATE DATABASE in a less 
cheesy way,
 * we can avoid this.
only that it has never been documented as a limitation to my knowledge...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump with both --serializable-deferrable and -j

2015-01-26 Thread Alexander Korotkov

Hackers,

when pg_dump is run with both --serializable-deferrable and -j options to
pg_dump, it returns errors:

pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing
transaction must not be READ ONLY DEFERRABLE
pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing
transaction must not be READ ONLY DEFERRABLE
pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing
transaction must not be READ ONLY DEFERRABLE
pg_dump: [parallel archiver] query was: SET TRANSACTION SNAPSHOT
'0001E300-1'
pg_dump: [archiver (db)] query failed: ERROR:  a snapshot-importing
transaction must not be READ ONLY DEFERRABLE

I've checked it on 9.4.0 and 9.3.5.

So, these options are incompatible now.
Could we start snapshot-importing transaction with repeatable read
isolation level? AFAICS, they should read exactly same data as
snapshot-exporting serializable transaction.
If not, could pg_dump return some more friendly error before actually
trying to dump?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Unsafe coding in ReorderBufferCommit()

Hi Tom,

On 2015-01-23 16:47:30 -0500, Tom Lane wrote:
 There are at least two bugs in reorderbuffer.c's ReorderBufferCommit():

Thanks for fixing these!

Unfortunately there's more - we'll currently do bad things if
transaction commit fails. At the very least the (sub-)transaction begin
commands need to be moved out of the exception block as they can
fail... :(. E.g. because this is the 2^32-1 subxact or similar...

I actually also want to strip the CATCH block of most of it's contents -
there's really no need anymore for most of what it does.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

Hi,

On 2015-01-25 14:02:47 -0500, Tom Lane wrote:
 I've been looking for other instances of the problem Mark Wilding
 pointed out, about missing volatile markers on variables that
 are modified in PG_TRY blocks and then used in the PG_CATCH stanzas.
 There definitely are some.  Current gcc versions do not warn about that.
 
 If you turn on -Wclobbered, you don't always get warnings about the
 variables that are problematic, and you do get warnings about variables
 that are perfectly safe.  (This is evidently why that option is not on
 by default: it's *useless*, or even counterproductive if it gives you
 false confidence that the compiler is protecting you.)

I've observed that too. IIUC the warnings are generated by observing
what register spilling does - which unfortunately seems to mean that the
more complex a function gets the less useful the clobber warnings get
because there's more spilling going on, generating pointless warnings.

I think it's actually not a recent regression - in the past a lot of
spurious instances of these warnings have been fixed by simply tacking
on volatile on variables that didn't actually need it.

 I tested this on gcc 4.4.7 (current on RHEL6), and gcc 4.8.3 (current
 on Fedora 20); they behave the same.  Even if this were fixed in
 bleeding-edge gcc, we'd definitely still need the volatile marker
 to get non-broken code compiled on most current platforms.

It's not fixed (both in the correct warning and not needing anymore sense) at 
least for
gcc (Debian 20150119-1) 5.0.0 20150119 (experimental) [trunk revision 219863]

 This is scary as hell.  I intend to go around and manually audit
 every single PG_TRY in the current source code, but that is obviously
 not a long-term solution.  Anybody have an idea about how we might
 get trustworthy mechanical detection of this type of situation?

Not really, except convincing gcc to fix the inaccurate detection. Given
that there've been bugs open about this (IIRC one from you even) for
years I'm not holding my breath.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] longjmp clobber warnings are utterly broken in modern gcc

On 2015-01-25 15:40:10 -0500, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
  Perhaps Clang has a more useful warning?
 
 Clang, at least the version on my Mac, doesn't warn either with the
 settings we normally use, and it doesn't have -Wclobber at all.
 I tried turning on -Weverything, and it still didn't complain.
 (It did generate incorrect code though, so it's no better than gcc
 in that respect.)

Even a pretty recent (3.6-rc1) clang doesn't warn. It generates lots of
useful warnings, but nothing about longjmp clobbers.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-26 Thread Marko Tiikkaja


On 1/22/15 6:03 PM, Pavel Stehule wrote:

2015-01-22 12:37 GMT+01:00 Marko Tiikkaja ma...@joh.to:

Or is that a stupid idea?  I just think hacking libpq for something like
this is a huge overkill.



I don't think so only plpgsql  solution is satisfactory idea. There are
some mix plpgsql / plperl ... application - and it isn't possible to remove
error context from only one language.


Yeah, not in libpq it isn't.  Thing is, PL/PgSQL already is the 
exception here, since it's the only language which does this error 
message suppression.  So if people did think this suppression was a good 
idea, only the people using PL/PgSQL were vocal enough to get the 
behavior changed.  I'm not looking to change that.


I can see where it's a lot nicer not to have the context visible for 
people who only care about the contents of the message, but the way it's 
done in PL/PgSQL right now is just not good enough.  On the other hand, 
the backwards compatibility breakage of doing this in libpq is quite 
extensive.  The most simple option seems to be to just allow a GUC to 
change PL/PgSQL's behavior to match what all other PLs are doing.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PL/pgSQL, RAISE and error context

2015-01-26 13:02 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 On 1/22/15 6:03 PM, Pavel Stehule wrote:

 2015-01-22 12:37 GMT+01:00 Marko Tiikkaja ma...@joh.to:

 Or is that a stupid idea?  I just think hacking libpq for something like
 this is a huge overkill.


 I don't think so only plpgsql  solution is satisfactory idea. There are
 some mix plpgsql / plperl ... application - and it isn't possible to
 remove
 error context from only one language.


 Yeah, not in libpq it isn't.  Thing is, PL/PgSQL already is the exception
 here, since it's the only language which does this error message
 suppression.  So if people did think this suppression was a good idea, only
 the people using PL/PgSQL were vocal enough to get the behavior changed.
 I'm not looking to change that.


 I can see where it's a lot nicer not to have the context visible for
 people who only care about the contents of the message, but the way it's
 done in PL/PgSQL right now is just not good enough.  On the other hand, the
 backwards compatibility breakage of doing this in libpq is quite
 extensive.  The most simple option seems to be to just allow a GUC to
 change PL/PgSQL's behavior to match what all other PLs are doing.



libpq was changed more time - there is still a open task about a protocol
change.

I afraid about some unexpected side effects of your proposal if somebody
mix languages - these side effects should not be critical - but on second
hand current behave is not critical too - we can wait.




 .marko

Re: [HACKERS] proposal: searching in array function - array_position

2015-01-26 23:01 GMT+01:00 Jim Nasby jim.na...@bluetreble.com:

 On 1/24/15 2:48 AM, Pavel Stehule wrote:

 with array_offsets - returns a array of offsets


 + entryreturns a offset of first occurrence of some element in a
 array. It uses
 should be
 + entryreturns the offset of the first occurrence of some
 element in an array. It uses

 + entryreturns a array of offset of all occurrences some element
 in a array. It uses
 should be
 + entryreturns an array of the offsets of all occurrences of
 some element in an array. It uses

 Any way to reduce the code duplication between the array and non-array
 versions? Maybe factor out the operator caching code?


I though about it - but there is different checks, different result
processing, different result type.

I didn't find any readable and reduced form :(


 You should remove the array_length() from the last array_offsets test; I
 don't see that it buys anything.


ok



 I think there should be tests for what happens when you feed these
 functions a multi-dimensional array.


I can do it. Result should be expected - it searching row by row due MD
format


 Other than that, looks good.


Thank you

Pavel


 --
 Jim Nasby, Data Architect, Blue Treble Consulting
 Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers


On 1/24/15 3:31 PM, Tom Lane wrote:

Another idea is to teach Valgrind that whenever a backend reduces its
pin count on a shared buffer to zero, that buffer should become undefined
memory.


paranoia

Shouldn't this technically tie in with ResourceOwners? If a pointer takes the 
pin count from 1 to 2, then that pointer should be invalid by the time the pin 
count goes from 2 to 1...

I'm worried that a simple test when pin count is 0 could miss some cases of 
pointers just happening to be cleared by a second part of the code even though 
the pin count has already dropped.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

Jim Nasby jim.na...@bluetreble.com writes:
 On 1/23/15 10:16 PM, Amit Kapila wrote:
 Further, if we want to just get the benefit of parallel I/O, then
 I think we can get that by parallelising partition scan where different
 table partitions reside on different disk partitions, however that is
 a matter of separate patch.

 I don't think we even have to go that far.

 My experience with Postgres is that it is *very* sensitive to IO latency (not 
 bandwidth). I believe this is the case because complex queries tend to 
 interleave CPU intensive code in-between IO requests. So we see this pattern:

 Wait 5ms on IO
 Compute for a few ms
 Wait 5ms on IO
 Compute for a few ms
 ...

 We blindly assume that the kernel will magically do read-ahead for us, but 
 I've never seen that work so great. It certainly falls apart on something 
 like an index scan.

 If we could instead do this:

 Wait for first IO, issue second IO request
 Compute
 Already have second IO request, issue third
 ...

 We'd be a lot less sensitive to IO latency.

It would take about five minutes of coding to prove or disprove this:
stick a PrefetchBuffer call into heapgetpage() to launch a request for the
next page as soon as we've read the current one, and then see if that
makes any obvious performance difference.  I'm not convinced that it will,
but if it did then we could think about how to make it work for real.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

On Mon, Jan 26, 2015 at 5:21 PM, Stephen Frost sfr...@snowman.net wrote:
 I don't disagree with you about any of that.  I don't think you disagree
 with my comment either.  What I'm not entirely clear on is how consensus
 could be reached.  Leaving it dormant for the better part of a year
 certainly doesn't appear to have helped that situation.  We've discussed
 having it be part of the main server and having it be a contrib module
 and until about a week ago, I had understood that having it in contrib
 would be preferrable.  Based on the recent emails, it appears there's
 been a shift of preference to having it be in-core, but clearly there's
 no time left to do that in this release cycle.

Well, I'm not sure that anyone else here agreed with me on that, and
one person does not a consensus make no matter who it is.  The basic
problem here is that we don't seem to have even two people here who
agree on how this ought to be done.  The basic dynamic here seems to
be you asking for changes and Abhijit making them but without any real
confidence, and I don't feel good about that.  I'm willing to defer to
an emerging consensus here when there is one, but what Abhijit likes
best is not a consensus, and neither is what you like, and neither is
what I like.  What we need is some people agreeing with each other.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New CF app deployment

On Mon, Jan 26, 2015 at 5:36 PM, Josh Berkus j...@agliodbs.com wrote:
 In order to get a consensus on moving to a new app I had to explain what
 was wrong with the old app.  Eventually I had to use strong language to
 do so, because nobody was paying attention otherwise.  While Magnus's
 app isn't my original proposal, I'm 100% behind it because it gets us
 moving forward and eliminates a lot of obstacles both to submitters and
 CFMs.  If Magnus hasn't already, in the future we'll be able to add more
 features to make things faster for major reviewers as well.

Frankly, I don't believe that you had much to do with getting
consensus on moving to a new app.  I believe Magnus did that, mostly
by writing it.  And I'm not complaining about whatever strong language
you may have used in the past; I'm complaining about you showing up on
this thread now, after the switch has already been made, to lob
insults.

 And I think that's great, because hopefully it will
 eventually make this much nicer than what I had. It isn't nicer yet,
 though, and you showing up and tossing out insults based on
 revisionist history won't fix that.

 You didn't previously say isn't nicer.  You said essentially
 unusuable.  There's a big difference between those two phrases.  Your
 emails came off as the new app is a disaster, we should revert
 immediately, and I'm not the only one who read them that way.

I stand by what I said.  I find it very hard to use in the present
state for the reasons that I set out.  That was not intended as a
request for a revert.  I have subsequently written several emails
clarifying my position on that point.  If Magnus *wants* to revert it,
that's fine.  But I suspect he doesn't, and that's fine too.  However,
I'd very much like the features that are missing added back, and,
yeah, I'm annoyed that they are missing.

 If you're going to throw around negative hyperbole, expect to get it
 thrown back at you.

Oh, give me a break.

 What particularly peeves me about your remarks is that you've
 basically made no contribution to the CommitFest process in years.

 Aside from being a CFM for one CF each year, of course.

OK, perhaps I'm overstating it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

2015-01-26 Thread David Steele

On 1/26/15 5:11 PM, Jim Nasby wrote:
 The race condition is a problem for pg_start/stop_backup and friends.
 In this instance, everything will be shut down when the rsync is
 running, so there isn't a timestamp race condition to worry about.

 Yeah, I'm more concerned about people that use rsync to take base
 backups. Do we need to explicitly advise against that? Is there a way
 to work around this with a sleep after pg_start_backup to make sure
 all timestamps must be different? (Admittedly I haven't fully wrapped
 my head around this yet.)
A sleep in pg_start_backup() won't work.  The race condition is in rsync
if the file is modified in the same second after it is copied.  Waiting
until the beginning of the next second in pg_start_backup() would
actually make a bigger window where the issue can occur.

I solved this problem in PgBackRest (an alternative to barman, etc.) by
waiting the remainder of the second after the manifest is built before
copying.  That way, if a file is modified in the second after the
manifest is built that later version will still be copied.  Any mods
after that will be copied in the next backup (as they should be). 
PgBackRest does not use rsync, tar, etc.) so I was able to code around
the issue.

The interesting thing about this race condition is that it does not
affect the backup where it occurs.  It affects the next backup when the
modified file does not get copied because the timestamp is the same as
the previous backup.  Of course using checksums will solve the problem
in rsync but that's expensive.

Thus my comment earlier that the hot rsync / cold rsync method is not
absolutely safe.  If you do checksums on the cold rsync then you might
as well just use them the first time - you'll have the same downtime
either way.

I've written tests to show the rsync vulnerability and another to show
that this can affect a running database.  However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.

-- 
- David Steele
da...@pgmasters.ne




signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Re: Abbreviated keys for Numeric

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:
 Obvious overheads in float8 comparison include having to check for NaN,
 and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
 a store/load to memory rather than just using a register. Looking at
 those might be more beneficial than messing with abbreviations.

Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.

I am not seriously suggesting pursuing abbreviation for float8 in the
near term - numeric is clearly what we should concentrate on. It's
interesting that abbreviation of float8 could potentially make sense,
though.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
 On Fri, Jan 23, 2015 at 1:16 PM, Stephen Frost sfr...@snowman.net wrote:
  Well, I'm still of the view that there's little to lose by having this
  remain out-of-core for a release or three.  We don't really all agree
  on what we want, and non-core code can evolve a lot faster than core
  code, so what's the rush?
 
  Being out of core makes it unusable in production environments for a
  large number of organizations. :/
 
 Tough.  Once we accept something into core, we're stuck maintaining it
 forever.  We shouldn't do that unless we're absolutely confident the
 design is solid. We are not close to having consensus on this.  If
 somebody has a reason for accepting only core PostgreSQL and not
 add-ons, it's either a stupid reason, or it's because they believe
 that the core stuff will be better thought-out than the add-ons.  If
 it's the latter, we shouldn't disappoint.

I don't disagree with you about any of that.  I don't think you disagree
with my comment either.  What I'm not entirely clear on is how consensus
could be reached.  Leaving it dormant for the better part of a year
certainly doesn't appear to have helped that situation.  We've discussed
having it be part of the main server and having it be a contrib module
and until about a week ago, I had understood that having it in contrib
would be preferrable.  Based on the recent emails, it appears there's
been a shift of preference to having it be in-core, but clearly there's
no time left to do that in this release cycle.

While I appreciate that it doesn't quite feel right to use the GRANT
system in the way I'm suggesting, I'm of the opinion that it's mostly
due to a lack of implementation with documentation and examples about
how it would work.  Using the GRANT system gives us nearly the best of
both worlds- an SQL-based syntax, a very fine-grained configuration
method, dependency handling, rename handling, and means you don't need
to be a superuser or have to restart the server to change the config,
nor is there any need to deal with CSV configuration via GUCs.

For my part, I'd still prefer an in-core system with top-level syntax
(eg: AUDIT), but that could clearly come later even if we included
pgaudit today (as Tom and others pointed out to me early this past
summer).  Auditing should definitely be a top-level capability in PG and
shouldn't be relegated to external modules or commercial solutions.
I've come around to feel that perhaps the first step should be a contrib
module rather than going in-core from the beginning as we'd get the
feedback which could lead to a better in-core solution.  I don't think
we're going to get it perfectly right the first time, either way, and
having it be completely outside the tree will mean we won't get any real
feedback on it.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] pgaudit - an auditing extension for PostgreSQL

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 When it comes to changing auditing settings, I think that needs to be very 
 restrictive. Really, it should be more (or differently) restrictive than SU, 
 so that you can effectively audit your superusers with minimal worries about 
 superusers tampering with auditing.

I continue to be of the opinion that you're not going to be able to
effectively audit your superusers with any mechanism that resides inside
of the process space which superusers control.  If you want to audit
superusers, you need something that operates outside of the postgres
process space.  I'm certainly interested in that, but it's an orthogonal
discussion to anything we're talking about here.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] proposal: searching in array function - array_position


On 1/26/15 4:17 PM, Pavel Stehule wrote:

Any way to reduce the code duplication between the array and non-array 
versions? Maybe factor out the operator caching code?


I though about it - but there is different checks, different result processing, 
different result type.

I didn't find any readable and reduced form :(


Yeah, that's why I was thinking specifically of the operator caching code... 
isn't that identical? That would at least remove a dozen lines...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync

Jim,

* Jim Nasby (jim.na...@bluetreble.com) wrote:
 On 1/23/15 12:40 PM, Stephen Frost wrote:
 That said, the whole timestamp race condition in rsync gives me the 
 heebie-jeebies. For normal workloads maybe it's not that big a deal, but 
 when dealing with fixed-size data (ie: Postgres blocks)? Eww.
 The race condition is a problem for pg_start/stop_backup and friends.
 In this instance, everything will be shut down when the rsync is
 running, so there isn't a timestamp race condition to worry about.
 
 Yeah, I'm more concerned about people that use rsync to take base backups. Do 
 we need to explicitly advise against that? Is there a way to work around this 
 with a sleep after pg_start_backup to make sure all timestamps must be 
 different? (Admittedly I haven't fully wrapped my head around this yet.)

I've thought about it a fair bit actually and I agree that there is some
risk to using rsync for *incremental* base backups.  That is, you have
a setup where you loop with:

pg_start_backup
rsync - dest
pg_stop_backup

without using -I, changing what 'dest' is, or making sure it's empty
every time.  The problem is the 1s-level granularity used on the
timestamp.  A possible set of operations, all within 1s, is:

file changed
rsync starts copying the file
file changed again (somewhere prior to where rsync is at)
rsync finishes the file copy

Now, this isn't actually a problem for the first time that file is
backed up- the issue is if that file isn't changed again.  rsync won't
re-copy it, but that change that rsync missed won't be in the WAL
history for the *second* backup that's done (only the first), leading to
a case where that file would end up corrupted.

This is a pretty darn narrow situation and one that I doubt many people
will hit, but I do think it's possible.

A way to address this would be to grab all timestamps for all files
at the start of the backup and re-copy any files whose times are changed
after that point (or which were being changed at the time the check was
done, or perhaps simply any file which has a timestamp after the
starting timestamp of the backup).

 How horribly difficult would it be to allow pg_upgrade to operate on 
 multiple servers? Could we have it create a shell script instead of 
 directly modifying things itself? Or perhaps some custom command file 
 that could then be replayed by pg_upgrade on another server? Of course, 
 that's assuming that replicas are compatible enough with masters for that 
 to work...
 Yeah, I had suggested that to Bruce also, but it's not clear why that
 would be any different from an rsync --size-only in the end, presuming
 everything went according to plan.
 
 Yeah, if everything is shut down maybe we're OK.

Regarding this, yes, I think it 'should' work, but it would definitely
be good to test it quite a bit before relying on it..

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] New CF app deployment

2015-01-26 Thread Josh Berkus

Robert,

 Didn't stop me.  And actually, I didn't face a shitstorm of criticism.
 The way I remember it, I got a pretty much unqualified positive
 reaction at the time.

Including from me, because it was a huge improvement on what we had
before.  As the new app is.

  Only later, when you had conveniently forgotten
 how bad things were before we had that tool, did you start routinely
 dumping on it.  And, AFAICS, not because of anything it did wrong,
 only because of things that it didn't do that were features you wanted
 to have.

In order to get a consensus on moving to a new app I had to explain what
was wrong with the old app.  Eventually I had to use strong language to
do so, because nobody was paying attention otherwise.  While Magnus's
app isn't my original proposal, I'm 100% behind it because it gets us
moving forward and eliminates a lot of obstacles both to submitters and
CFMs.  If Magnus hasn't already, in the future we'll be able to add more
features to make things faster for major reviewers as well.

 And I think that's great, because hopefully it will
 eventually make this much nicer than what I had. It isn't nicer yet,
 though, and you showing up and tossing out insults based on
 revisionist history won't fix that.

You didn't previously say isn't nicer.  You said essentially
unusuable.  There's a big difference between those two phrases.  Your
emails came off as the new app is a disaster, we should revert
immediately, and I'm not the only one who read them that way.

If you're going to throw around negative hyperbole, expect to get it
thrown back at you.

 What particularly peeves me about your remarks is that you've
 basically made no contribution to the CommitFest process in years. 

Aside from being a CFM for one CF each year, of course.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

Jim Nasby jim.na...@bluetreble.com writes:
 On 1/24/15 3:31 PM, Tom Lane wrote:
 Another idea is to teach Valgrind that whenever a backend reduces its
 pin count on a shared buffer to zero, that buffer should become undefined
 memory.

 paranoia

 Shouldn't this technically tie in with ResourceOwners?

No.  ResourceOwner is just a mechanism to ensure that we remember to call
UnpinBuffer, it has no impact on what the semantics of the pin count are.
The *instant* the pin count goes to zero, another backend is entitled to
recycle that buffer for some other purpose.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Andrew Gierth

 Peter == Peter Geoghegan p...@heroku.com writes:

 Peter What I find particularly interesting about this patch is that it
 Peter makes sorting numerics significantly faster than even sorting
 Peter float8 values,

I get a much smaller difference there than you do.

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Abbreviated keys for Numeric

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:
 On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
 and...@tao11.riddles.org.uk wrote:
  Obvious overheads in float8 comparison include having to check for NaN,
  and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
  a store/load to memory rather than just using a register. Looking at
  those might be more beneficial than messing with abbreviations.
 
 Aren't there issues with the alignment of double precision floating
 point numbers on x86, too? Maybe my information there is at least
 partially obsolete. But it seems we'd have to control for this to be
 sure.

I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_upgrade and rsync


On 1/26/15 5:08 PM, David Steele wrote:

I've written tests to show the rsync vulnerability and another to show
that this can affect a running database.  However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.


Related to this and Stephen's comment about testing... ISTM it would be very 
useful to have a published suite of tests for PITR backups, perhaps even 
utilizing special techniques in Postgres to expose potential failure 
conditions. Similarly, it'd also be nice to have a suite of tests you could run 
to validate a backup that you've restored.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Shortcoming in CLOBBER_FREED_MEMORY coverage: disk buffer pointers

2015-01-26 Thread Greg Stark

On Tue, Jan 27, 2015 at 12:03 AM, Jim Nasby jim.na...@bluetreble.com
wrote:

 But one backend can effectively pin a buffer more than once, no? If so,
 then ISTM there's some risk that code path A pins and forgets to unpin, but
 path B accidentally unpins for A.


The danger is that there's a codepath that refers to memory it doesn't have
a pin on but that there is no actual test in our regression suite that
doesn't actually have a second pin on the same buffer. If there is in fact
no reachable code path at all without the second pin then there's no active
bug, only a bad coding practice. But if there are code paths that we just
aren't testing then that's a real bug.

IIRC CLOBBER_FREED_MEMORY only affects palloc'd blocks. Do we not have a
mode that automatically removes any buffer as soon as it's not pinned? That
seems like it would be a valuable addition.

Fwiw I think our experience is that bugs where buffers are unpinned get
exposed pretty quickly in production. I suppose the same might not be true
for rarely called codepaths or in cases where the buffers are usually
already pinned.


-- 
greg

Re: [HACKERS] Re: Abbreviated keys for Numeric

2015-01-26 Thread Petr Jelinek


On 27/01/15 00:51, Andres Freund wrote:

On 2015-01-26 15:35:44 -0800, Peter Geoghegan wrote:

On Mon, Jan 26, 2015 at 3:12 PM, Andrew Gierth
and...@tao11.riddles.org.uk wrote:

Obvious overheads in float8 comparison include having to check for NaN,
and the fact that DatumGetFloat8 on 64bit doesn't get inlined and forces
a store/load to memory rather than just using a register. Looking at
those might be more beneficial than messing with abbreviations.


Aren't there issues with the alignment of double precision floating
point numbers on x86, too? Maybe my information there is at least
partially obsolete. But it seems we'd have to control for this to be
sure.


I think getting rid of the function call for DatumGetFloat8() would be
quite the win. On x86-64 the conversion then should amount to mov
%rd?,-0x8(%rsp);movsd -0x8(%rsp),%xmm0 - that's pretty cheap. Both
instructions have a cycle count of 1 + L1 access latency (4) + 2 because
they use the same exection port. So it's about 12 fully pipelineable
cycles. 2 if the pipeline can kept busy otherwise. I doubt that'd be
noticeable if the conversion were inlined.



IIRC the DatumGetFloat8 was quite visible in the perf when I was writing 
the array version of width_bucket. It was one of the motivations for 
making special float8 version since not having to call it had 
significant effect. Sadly I don't remember if it was the function call 
itself or the conversion anymore.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Suppressing elog.c context messages (was Re: [HACKERS] Wait free LW_SHARED acquisition)


On 12/23/14 11:41 AM, Andres Freund wrote:

 I think it'd generally be useful to have something like errhidecontext()
 akin to errhidestatement() to avoid things like the above.
 


Under this proposal, do you want to suppress the context/statement
unconditionally or via some hook/variable, because it might be useful to
print the contexts/statements in certain cases where there is complex
application and we don't know which part of application code causes
problem.

I'm proposing to do model it after errhidestatement(). I.e. add
errhidecontext().

I've attached what I was tinkering with when I wrote this message.


 The usecases wher eI see this as being useful is high volume debug
 logging, not normal messages...
 


I think usecase is valid, it is really difficult to dig such a log
especially when statement size is big.

Right, that was what triggered to look me into it. I'd cases where the
same context was printed thousands of times.


In case anyone picks this up... this problem exists in other places too, such 
as RAISE DEBUG in plpgsql. I think it'd be useful to have clien_min_context and 
log_min_context that operate akin to *_min_messages but control context output.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

On 2015-01-05 21:43:04 -0500, Bruce Momjian wrote:
 On Fri, Jan  2, 2015 at 06:25:52PM +0100, Andres Freund wrote:
  I can't run tests right now...
 
  What exactly do you want to see with these tests? that's essentially
  what I've already benchmarked + some fprintfs?

 I want to test two patches that both allocate an extra 64 bytes but
 shift the alignment of just the buffer descriptor memory allocation.

pgbench scale 300, with postgres configured:
-c shared_buffers=48GB
-c log_line_prefix=[%m %p] 
-c log_min_messages=debug1
-p 5440
-c checkpoint_segments=600
-c fsync=off
-c synchronous_commit=off
-c maintenance_work_mem=4GB
-c huge_pages=off
-c log_autovacuum_min_duration='10s'

test: pgbench -n -M prepared -c 96 -j 96 -T 30 -S

master as of 4b2a254793be50e31d43d4bfd813da8d141494b8:
-c max_connections=400
tps = 193748.454044 (high run vs. run variability)
-c max_connections=401
tps = 484238.551349

master + 32align.patch:
-c max_connections=400
tps = 183791.872359 (high run vs. run variability)
-c max_connections=401
tps = 185494.98244 (high run vs. run variability)

master + 64align.patch:
-c max_connections=400
tps = 489257.195570
-c max_connections=401
tps = 490496.520632

Pretty much as expected, rigth?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: two new role attributes and/or capabilities?