date:20171211

Re: Rethinking MemoryContext creation

2017-12-11 Thread Simon Riggs

On 11 December 2017 at 17:38, Tom Lane  wrote:
> Simon Riggs  writes:
>> On 11 December 2017 at 16:27, Tom Lane  wrote:
>>> For a *very* large majority of the callers of AllocSetContextCreate,
>>> the context name is a simple C string constant, so we could just store
>>> the pointer to it and save the space and cycles required to copy it.
>
>> Why have the string at all in that case?
>
> Try reading a MemoryContextStats dump without it ...

I understood. I thought you were suggesting removing it in favour of a pointer.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Boolean partitions syntax

2017-12-11 Thread Amit Langote

On 2017/12/12 15:35, Amit Langote wrote:
> Works for me, updated patch attached.

Oops, attached the old one with the last email.

Updated one really attached this time.

Thanks,
Amit
From 318c815264b27fcbda5b83e542c6a2970f714399 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 12 Dec 2017 10:33:11 +0900
Subject: [PATCH] Allow Boolean values in partition FOR VALUES clause

---
 doc/src/sgml/ref/create_table.sgml |  6 +++---
 src/backend/parser/gram.y  | 16 +++-
 src/test/regress/expected/create_table.out | 14 ++
 src/test/regress/sql/create_table.sql  |  7 +++
 4 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index a0c9a6d257..eaa79ae333 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -86,9 +86,9 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] 
TABLE [ IF NOT EXI
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] ) |
+IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ebfc94f896..7dfa2759a4 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -153,6 +153,7 @@ static Node *makeBitStringConst(char *str, int location);
 static Node *makeNullAConst(int location);
 static Node *makeAConst(Value *v, int location);
 static Node *makeBoolAConst(bool state, int location);
+static Node *makeBoolAConstNoCast(bool state, int location);
 static RoleSpec *makeRoleSpec(RoleSpecType type, int location);
 static void check_qualified_name(List *names, core_yyscan_t yyscanner);
 static List *check_func_name(List *names, core_yyscan_t yyscanner);
@@ -2768,6 +2769,8 @@ partbound_datum:
Sconst  { $$ = makeStringConst($1, @1); 
}
| NumericOnly   { $$ = makeAConst($1, @1); }
| NULL_P{ $$ = makeNullAConst(@1); }
+   | TRUE_P{ $$ = 
makeBoolAConstNoCast(true, @1); }
+   | FALSE_P   { $$ = 
makeBoolAConstNoCast(false, @1); }
;
 
 partbound_datum_list:
@@ -15581,13 +15584,24 @@ makeAConst(Value *v, int location)
 static Node *
 makeBoolAConst(bool state, int location)
 {
+   A_Const *n = (A_Const *) makeBoolAConstNoCast(state, location);
+
+   return makeTypeCast((Node *)n, SystemTypeName("bool"), -1);
+}
+
+/* makeBoolAConstNoCast()
+ * Create an A_Const string node containing valid bool type values.
+ */
+static Node *
+makeBoolAConstNoCast(bool state, int location)
+{
A_Const *n = makeNode(A_Const);
 
n->val.type = T_String;
n->val.val.str = (state ? "t" : "f");
n->location = location;
 
-   return makeTypeCast((Node *)n, SystemTypeName("bool"), -1);
+   return (Node *) n;
 }
 
 /* makeRoleSpec
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index 8e745402ae..c541a652c4 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -863,3 +863,17 @@ Partition key: LIST (a)
 Number of partitions: 0
 
 DROP TABLE parted_col_comment;
+-- boolean partitions
+create table boolspart (a bool) partition by list (a);
+create table boolspart_t partition of boolspart for values in (true);
+create table boolspart_f partition of boolspart for values in (false);
+\d+ boolspart
+ Table "public.boolspart"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description 
++-+---+--+-+-+--+-
+ a  | boolean |   |  | | plain   |  | 
+Partition key: LIST (a)
+Partitions: boolspart_f FOR VALUES IN (false),
+boolspart_t FOR VALUES IN (true)
+
+drop table boolspart;
diff --git a/src/test/regress/sql/create_table.sql 
b/src/test/regress/sql/create_table.sql
index 8f9991ef18..c71e9f938e 100644
--- a/src/test/regress/sql/create_table.sql
+++ b/src/test/regress/sql/create_table.sql
@@ -707,3 +707,10 @@ COMMENT ON COLUMN parted_col_comment.a IS 'Partition key';
 SELECT obj_description('parted_col_comment'::regclass);
 \d+ parted_col_comment
 DROP TABLE parted_col_comment;
+
+-- boolean partitions
+create table bo

Re: Boolean partitions syntax

2017-12-11 Thread Amit Langote

Hi Dilip.

Thanks for the review.

On 2017/12/12 15:03, Dilip Kumar wrote:
> On Tue, Dec 12, 2017 at 7:19 AM, Amit Langote wrote:
>> Horiguchi-san pointed out [1] on a nearby thread that the partitioning
>> syntax (the FOR VALUES clause) doesn't accept true and false as valid
>> partition bound datums, which seems to me like an oversight.  Attached a
>> patch to fix that.
>>
>> create table bools (a bool) partition by list (a);
>>
>> Before patch:
>>
>> create table bools_t partition of bools for values in (true);
>> ERROR:  syntax error at or near "true"
>> LINE 1: ...reate table bools_t partition of bools for values in (true);
>>
>> After:
>>
>> create table bools_t partition of bools for values in (true);
>> CREATE TABLE
>> \d bools_t
>>   Table "public.bools_t"
>>  Column |  Type   | Collation | Nullable | Default
>> +-+---+--+-
>>  a  | boolean |   |  |
>> Partition of: bools FOR VALUES IN (true)
>
> +makeBoolAConstNoCast(bool state, int location)
> +{
> + A_Const *n = makeNode(A_Const);
> +
> + n->val.type = T_String;
> + n->val.val.str = (state ? "t" : "f");
> + n->location = location;
> +
> + return (Node *) n;
> +}
> +
> 
> I think we can change makeBoolAConst as below so that we can avoid
> duplicate code.
> 
> static Node *
> makeBoolAConst(bool state, int location)
> {
> Node *n = makeBoolAConstNoCast(state, location);
> 
> return makeTypeCast(n, SystemTypeName("bool"), -1);
> }

Works for me, updated patch attached.

Thanks,
Amit
From 85d51048ee147c819a9ed0648557c7f05f3802c5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 12 Dec 2017 10:33:11 +0900
Subject: [PATCH] Allow Boolean values in partition FOR VALUES clause

---
 doc/src/sgml/ref/create_table.sgml |  6 +++---
 src/backend/parser/gram.y  | 18 ++
 src/test/regress/expected/create_table.out | 14 ++
 src/test/regress/sql/create_table.sql  |  7 +++
 4 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index a0c9a6d257..eaa79ae333 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -86,9 +86,9 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] 
TABLE [ IF NOT EXI
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] ) |
+IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ebfc94f896..c6cbf9b844 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -153,6 +153,7 @@ static Node *makeBitStringConst(char *str, int location);
 static Node *makeNullAConst(int location);
 static Node *makeAConst(Value *v, int location);
 static Node *makeBoolAConst(bool state, int location);
+static Node *makeBoolAConstNoCast(bool state, int location);
 static RoleSpec *makeRoleSpec(RoleSpecType type, int location);
 static void check_qualified_name(List *names, core_yyscan_t yyscanner);
 static List *check_func_name(List *names, core_yyscan_t yyscanner);
@@ -2768,6 +2769,8 @@ partbound_datum:
Sconst  { $$ = makeStringConst($1, @1); 
}
| NumericOnly   { $$ = makeAConst($1, @1); }
| NULL_P{ $$ = makeNullAConst(@1); }
+   | TRUE_P{ $$ = 
makeBoolAConstNoCast(true, @1); }
+   | FALSE_P   { $$ = 
makeBoolAConstNoCast(false, @1); }
;
 
 partbound_datum_list:
@@ -15590,6 +15593,21 @@ makeBoolAConst(bool state, int location)
return makeTypeCast((Node *)n, SystemTypeName("bool"), -1);
 }
 
+/* makeBoolAConstNoCast()
+ * Create an A_Const string node containing valid bool type values.
+ */
+static Node *
+makeBoolAConstNoCast(bool state, int location)
+{
+   A_Const *n = makeNode(A_Const);
+
+   n->val.type = T_String;
+   n->val.val.str = (state ? "t" : "f");
+   n->location = location;
+
+   return (Node *) n;
+}
+
 /* makeRoleSpec
  * Create a RoleSpec with the given type
  */
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index 8e745402ae..c541a652c4 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -863,3 +863,17 @@ Partition key: LIST (a)
 Number

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 13:58, Tom Lane  wrote:

> Craig Ringer  writes:
> > I doubt I've ever written just "exit" or "quit" without indentation. I
> > think if it requires them to be a bareword with no indentation, strictly
> > ^(exit|quit)\n when isatty, then that's probably a safe and helpful
> choice.
>
> FWIW, I think that the special behavior (whatever it winds up being
> exactly) ought to trigger on an input line matching
>
> [whitespace]*help[whitespace]*(;[whitespace]*)?
>
> and similarly for exit/quit.  I think that novices might have internalized
> enough SQL syntax to think that they need to terminate the command with a
> semicolon --- in fact, we regularly see examples in which seasoned users
> think they need to terminate backslash commands with a semicolon, so
> that's hardly far-fetched.  And we might as well allow as much whitespace
> as we can, because nobody but Guido Rossum thinks that having whitespace
> be semantically significant is a good idea.
>
>
Yeah, I think that's fine - and sensible - if we do what Robert proposed
and pass it through to psql's buffer as well as printing some helptext.

It's probably not even that bad if we didn't pass it through, really, as
those lines have little good reason to exist on their own, but I'd rather
not push our luck.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 14:06, Tom Lane  wrote:

> > Another question is whether printing to stderr, bypassing proper
> > logging!, is good enough for something like this.
>
> Yeah, this is an issue.  MemoryContextStats is designed to print
> to stderr in the (possibly vain) hope that it will work even when
> we are up against an OOM failure.  That's not a property I much
> want to give up, but you're right that it's not very desirable
> if a user is trying to capture state during normal running.
>

If we were willing to wrap variadic calls in a callback and rely on
vfprintf, we could:

typedef void (*printwrapper)(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);

...

static void
printwrapper_stderr(void *extra, const char *fmt, ...)
{
vfprintf(stderr, fmt, va_list);
}

void
MemoryContextStats(MemoryContext context)
{
MemoryContextStatsDetail(context, 100, printwrapper_stderr, NULL);
}

void
MemoryContextStatsDetail(MemoryContext context, int max_children,
printwrapper outfunc, void *printwrapper_extra)
{
 ...
 printwrapper(extra, "Grand total: ...", ...);
}

and let the ProcSignal based caller pass an elog wrapper instead, or form a
StringInfo with appendStringInfoVA then elog it in one go after the
MemoryContextStatsDetail call returns.

I was worried about relying on vfprintf, but we already use it widely in
the codebase so it shouldn't be an issue with elderly buildfarm critters.

Letting it go to stderr isn't too bad, but it'd certainly make it that bit
nicer if it didn't so I'm not opposed to adding the needed indirection.
I'll give it a try in a bit.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: pgbench's expression parsing & negative numbers

2017-12-11 Thread Fabien COELHO



Hello Andres,


working on overflow correctness in pg I noticed that pgbench isn't quite
there.  I assume it won't matter terribly often, but the way it parses
integers makes it incorrect for, at least, the negativemost number.

It directly parses the integer input using:
{digit}+{
yylval->ival = strtoint64(yytext);
return INTEGER_CONST;
}

but that unfortunately means that the sign is no included in the call to
strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,


Indeed. For a benchmarking script which generates a pseudo random load I 
do not see as a big issue, but maybe I'm missing some use case.


because that's not representable as a positive number on a two's 
complement machine (i.e. everywhere).  Preliminary testing shows that 
that can relatively easily fixed by just prefixing [-+]? to the relevant 
exprscan.l rules.


Beware that it must handle the unary/binary plus/minus case consistently:

  "-123" vs "a -123" vs "a + -123"

But it might also not be a bad idea to simply defer parsing of the 
number until exprparse.y has its hand on the number?


It is usually simpler to let the lexer handle constants.

There's plenty other things wrong with overflow handling too, especially 
evalFunc() basically just doesn't care.


Sure.

There are some overflow checking with div and double to int cast, which 
were added because of previous complaints, but which are not very useful 
to me.


ISTM that it is a voluntary feature, which handles int64 operations 
with a modulo overflow like C. Generating exceptions on such overflows 
does not look very attractive to me.



 I'll come up with a patch for that sometime soon.


I wish you could provide some motivation: why does it matter much to a 
benchmarking script to handle overflows with more case?



A third complaint I have is that the tap tests are pretty hard to parse
for humans.


Probably.

Perl is pretty hard to humans to start with:-)

There is a lot of code factorization to cram many tests together, so that 
one test is more or less one line and there are hundreds of them. Not sure 
it would help if it was expanded.


There are a lot of regular expressions to check outputs, which does not 
help.


I wanted to have the pgbench scripts outside but this has been rejected, 
so the tested scripts themselves are in the middle of the perl code, which 
I think has degraded the readability significantly.


--
Fabien.

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 12:43, Andres Freund  wrote:

> Hi,
>
> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
> > TL;DR: Lets add a ProcSignalReason that makes a backend
> > call MemoryContextStats when it sees it and a C func that users can use
> to
> > set it on a proc. Sane?
>
> It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
> handler, so you can't really rely on a lot of stuff being legal to
> do.
>
> It'd be easy to set a flag in the handler and then have
> CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call.

Yes, definitely. That was my intention. Trying to write to stderr, mess
with memory contexts, etc from a signal handler context seems awfully hairy
and definitely not something I'd want to risk on a live system.

> But that'd have
> the disadvanatage that it possibly would take a while till the
> MemoryContextStats() is executed. Not sure if that's still good enough
> for you?
>

Definitely. Sure, it won't be perfect, but it'd be a big improvement on
what we have.

> Another question is whether printing to stderr, bypassing proper
> logging!, is good enough for something like this.
>

I think the reason it prints to stderr now is that it's intended to run in
OOM situations.

Arguably that's not such a concern when being triggered by a procsignal. So
elog(...) in that context could make sense. I'd probably add a
print-wrapper callback arg to MemoryContextStatsDetail that you can use to
write to a stringinfo / elog / fprintf(stderr), as desired.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
>> TL;DR: Lets add a ProcSignalReason that makes a backend
>> call MemoryContextStats when it sees it and a C func that users can use to
>> set it on a proc. Sane?

> It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
> handler, so you can't really rely on a lot of stuff being legal to do.

Indeed, calling MemoryContextStats in a signal handler would be a damfool
idea, but Craig didn't propose that AFAICS.

> Another question is whether printing to stderr, bypassing proper
> logging!, is good enough for something like this.

Yeah, this is an issue.  MemoryContextStats is designed to print
to stderr in the (possibly vain) hope that it will work even when
we are up against an OOM failure.  That's not a property I much
want to give up, but you're right that it's not very desirable
if a user is trying to capture state during normal running.

regards, tom lane

Re: Boolean partitions syntax

2017-12-11 Thread Dilip Kumar

On Tue, Dec 12, 2017 at 7:19 AM, Amit Langote  wrote:

> Hi.
>
> Horiguchi-san pointed out [1] on a nearby thread that the partitioning
> syntax (the FOR VALUES clause) doesn't accept true and false as valid
> partition bound datums, which seems to me like an oversight.  Attached a
> patch to fix that.
>
> create table bools (a bool) partition by list (a);
>
> Before patch:
>
> create table bools_t partition of bools for values in (true);
> ERROR:  syntax error at or near "true"
> LINE 1: ...reate table bools_t partition of bools for values in (true);
>
> After:
>
> create table bools_t partition of bools for values in (true);
> CREATE TABLE
> \d bools_t
>   Table "public.bools_t"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | boolean |   |  |
> Partition of: bools FOR VALUES IN (true)
>
> Thanks,
> Amit
>
> [1]
> https://www.postgresql.org/message-id/20171128.203915.
> 26713586.horiguchi.kyotaro%40lab.ntt.co.jp
>

+makeBoolAConstNoCast(bool state, int location)
+{
+ A_Const *n = makeNode(A_Const);
+
+ n->val.type = T_String;
+ n->val.val.str = (state ? "t" : "f");
+ n->location = location;
+
+ return (Node *) n;
+}
+

I think we can change makeBoolAConst as below so that we can avoid
duplicate code.

static Node *
makeBoolAConst(bool state, int location)
{
Node *n = makeBoolAConstNoCast(state, location);

return makeTypeCast(n, SystemTypeName("bool"), -1);
}

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 12:25, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> > TL;DR: Lets add a ProcSignalReason that makes a backend call
> > MemoryContextStats when it sees it and a C func that users can use to set
> > it on a proc. Sane?
>
> > So how about borrowing a ProcSignalReason entry for "dump a memory
> context
> > summary at your earliest convenience" ? We could name it a more generic
> > "dump debug data" in case we want to add things later.
> >
> > Then a new pg_log_debug_backend(int) function or something like that
> could
> > signal the proc and let CHECK_FOR_INTERRUPTS handle calling
> > MemoryContextStats next time it's called.
>
> +1
> That's one of things I wanted to do.  It will be more useful on Windows.
> Would it work for autovac processes and background workers, etc. that
> connect to shared memory?
>

Anything that uses CHECK_FOR_INTERRUPTS() and is attached to PGXACT. So
yeah, pretty much anything attached to shmem.


> I have also wanted to dump stack traces.  Linux (glibc) has
> backtrace_symbols(), and Windows has StackWalk()/StackWalk64().  Is it sane
> to make the function a hook?
>

In-proc stack traces are immensely useful, and IMO relatively safe in a
proc that's already in a reasonable state. If your stack is mangled, making
it worse with an in-proc stack trace is rarely your biggest concern. I'd
LOVE to be able to do this.

However, I'd want to address anything like that quite separately to the
change I proposed to expose an existing facility.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Tom Lane

Craig Ringer  writes:
> I doubt I've ever written just "exit" or "quit" without indentation. I
> think if it requires them to be a bareword with no indentation, strictly
> ^(exit|quit)\n when isatty, then that's probably a safe and helpful choice.

FWIW, I think that the special behavior (whatever it winds up being
exactly) ought to trigger on an input line matching

[whitespace]*help[whitespace]*(;[whitespace]*)?

and similarly for exit/quit.  I think that novices might have internalized
enough SQL syntax to think that they need to terminate the command with a
semicolon --- in fact, we regularly see examples in which seasoned users
think they need to terminate backslash commands with a semicolon, so
that's hardly far-fetched.  And we might as well allow as much whitespace
as we can, because nobody but Guido Rossum thinks that having whitespace
be semantically significant is a good idea.

regards, tom lane

Re: [HACKERS] A design for amcheck heapam verification

2017-12-11 Thread Michael Paquier

On Fri, Dec 8, 2017 at 4:37 AM, Peter Geoghegan  wrote:
> Here, we repeat the same test 3 times, varying only the seed value
> used for each run.

Thanks for the updated version! Looking at 0001, I have run a coverage
test and can see all code paths covered, which is nice. It is also
easier to check the correctness of the library. There are many ways to
shape up such tests, you chose one we could live with.

> The last message is a WARNING because we exceed the 1% threshold
> (hard-coded into test_bloomfilter.c), though only by a tiny margin,
> due only to random variations in seed value. We round up to 10 bits
> per element for the regression tests. That's where the *actual*
> "nelements" argument comes from within the tests, so pg_regress tests
> should never see the WARNING (if they do, that counts as a failure).
>
> I've experimentally observed that we get the 1% false positive rate
> with any possible bitset when "nelements" works out at 9.6 bitset bits
> per element. Inter-run variation is tiny. With 50 tests, I didn't
> observe these same Bloom filter parameters produce a false positive
> rate that came near to 1.1%. 1.01% or 1.02% was about as bad as it
> got.

Nice. That's close to what I would expect with the sizing this is doing.

> There is a fairly extensive README, which I hope will clear up the
> theory behind the bloomfilter.c strategy on bitset size and false
> positives. Also, there was a regression that I had to fix in
> bloomfilter.c, in seeding. It didn't reliably cause variation in the
> false positives. And, there was bitrot with the documentation that I
> fixed up.

+   /*
+* Generate random seed, or use caller's.  Seed should always be a positive
+* value less than or equal to PG_INT32_MAX, to ensure that any random seed
+* can be recreated through callerseed if the need arises.  (Don't assume
+* that RAND_MAX cannot exceed PG_INT32_MAX.)
+*/
+   seed = callerseed < 0 ? random() % PG_INT32_MAX : callerseed;
This could use pg_backend_random() instead.

+--
+-- These tests don't produce any interesting output, unless they fail.  For an
+-- explanation of the arguments, and the values used here, see README.
+--
+SELECT test_bloomfilter(power => 23,
+nelements => 838861,
+seed => -1,
+tests => 1);
This could also test the reproducibility of the tests with a fixed
seed number and at least two rounds, a low number of elements could be
more appropriate to limit the run time.

+   /*
+* Aim for two bytes per element; this is sufficient to get a false
+* positive rate below 1%, independent of the size of the bitset or total
+* number of elements.  Also, if rounding down the size of the bitset to
+* the next lowest power of two turns out to be a significant drop, the
+* false positive rate still won't exceed 2% in almost all cases.
+*/
+   bitset_bytes = Min(bloom_work_mem * 1024L, total_elems * 2);
+   /* Minimum allowable size is 1MB */
+   bitset_bytes = Max(1024L * 1024L, bitset_bytes);
I would vote for simplifying the initialization routine and just
directly let the caller decide it. Are there implementation
complications if this is not a power of two? I cannot guess one by
looking at the code. I think that the key point is just to document
that a false positive rate of 1% can be reached with just having
9.6bits per elements, and that this factor can be reduced by 10 if
adding 4.8 bits per elements. So I would expect people using this API
are smart enough to do proper sizing. Note that some callers may
accept even a larger false positive rate. In short, this basically
brings us back to Thomas' point upthread with a size estimation
routine, and an extra routine doing the initialization:
https://www.postgresql.org/message-id/CAEepm=0dy53x1hg5dmyzmpv_kn99crxzqbto8gmiosxnyrx...@mail.gmail.com
Did you consider it? Splitting the size estimation and the actual
initialization has a lot of benefits in my opinion.

+/*
+ * What proportion of bits are currently set?
+ *
+ * Returns proportion, expressed as a multiplier of filter size.
+ *
[...]
+ */
+double
+bloom_prop_bits_set(bloom_filter *filter)
Instead of that, having a function that prints direct information
about the filter's state would be enough with the real number of
elements and the number of bits set in the filter. I am not sure that
using rates is a good idea, sometimes rounding can cause confusion.
-- 
Michael

pgbench's expression parsing & negative numbers

2017-12-11 Thread Andres Freund

Hi,

working on overflow correctness in pg I noticed that pgbench isn't quite
there.  I assume it won't matter terribly often, but the way it parses
integers makes it incorrect for, at least, the negativemost number.

It directly parses the integer input using:
{digit}+{
yylval->ival = strtoint64(yytext);
return INTEGER_CONST;
}

but that unfortunately means that the sign is no included in the call to
strtoint64. Which in turn means you can't correctly parse PG_INT64_MIN,
because that's not representable as a positive number on a two's
complement machine (i.e. everywhere).  Preliminary testing shows that
that can relatively easily fixed by just prefixing [-+]? to the relevant
exprscan.l rules.  But it might also not be a bad idea to simply defer
parsing of the number until exprparse.y has its hand on the number?

There's plenty other things wrong with overflow handling too, especially
evalFunc() basically just doesn't care.  I'll come up with a patch for
that sometime soon.

A third complaint I have is that the tap tests are pretty hard to parse
for humans.

Greetings,

Andres Freund

Re: pg_dumpall -r -c try to drop user postgres

2017-12-11 Thread Jeff Janes

On Tue, Dec 5, 2017 at 9:03 AM, Pavel Stehule 
wrote:

> On Sun, Dec 3, 2017 at 3:21 PM, Pavel Stehule 
>> wrote:
>> > I am not sure if user postgres should be removed, so it is probably bug
>> >
>> > pg_dumpall -r -c | grep postgres
>> >
>> > DROP ROLE postgres;
>> > CREATE ROLE postgres;
>>
>>

> I am not sure, if issue is in this code. But I am sure, so DROP ROLE
> postgres is just nonsense
>
> This command should to fail every time, and then should not be generated.
>

I don't see why it should fail every time.

Not all databases clusters were created with their initdb superuser being
'postgres'.

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-12-11 Thread Jeff Janes

On Thu, Oct 5, 2017 at 10:16 AM, Nico Williams 
wrote:

> On Thu, Sep 14, 2017 at 04:08:08PM -0400, Robert Haas wrote:
> > On Thu, Sep 14, 2017 at 2:33 PM, Jeff Janes 
> wrote:
> > > I think that foreign tables ought to behave as views do, where they
> run as
> > > the owner rather than the invoker.  No one has talked me out of it,
> but no
> > > one has supported me on it either.  But I think it is too late to
> change
> > > that now.
> >
> > That's an interesting point.  I think that you can imagine use cases
> > for either method.  Obviously, if what you want to do is drill a hole
> > through the Internet to another server and then expose it to some of
> > your fellow users, having the FDW run with the owner's permissions
> > (and credentials) is exactly right.  But there's another use case too,
> > which is where you have something that looks like a multi-user
> > sharding cluster.  You want each person's own credentials to carry
> > over to everything they do remotely.
>
> Hmmm, I don't think that's really right.
>
> What I'd like instead is for the FDW client to tell the FDW server the
> session_user/current_user on behalf of which it's acting, and let the
> FDW server decide how to proceed.  This could be done by doing a SET
> SESSION fdw.client.session_user... and so on.
>

So then the FDW client would still have to authenticate itself, presumably
as a superuser, to the FDW server, and just tell the server "trust me on
who I'm representing here"?


>
> We use Kerberos principal names as PG user/role names, _with_ @REALM
> included, so a user foo@BAR is likely to make sense to the FDW server.
>
> Of course, if you're not using Kerberos then the local and remote user
> namespaces might be completely distinct, but by letting the FDW server
> know a) the FDW client's username (via authentication) and b) the true
> username on the client side (via SET/set_config()), the server might
> have enough information to decide whether it trusts (a) to impersonate
> (b) and how to map (b) to a local user.
>

With Kerberos, shouldn't the ultimate client be able (in theory) to
authenticate to the end server through the intermediate server, without
giving the intermediate server general impersonation privileges?  If that
can be done in theory, any idea on how to implement it?

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-12-11 Thread Jeff Janes

On Tue, Dec 5, 2017 at 8:35 AM, Robert Haas  wrote:

> On Mon, Dec 4, 2017 at 5:57 PM, Ashutosh Bapat
>  wrote:
> > I think the real behaviour can be described as something like this:
> >
> > "Only superusers may connect to foreign servers without password
> > authentication, so always specify the password
> > option for user mappings that may be used by non-superusers." But
> > which user mappings may be used by non-superusers can not be defined
> > without explaining views owned by superusers. I don't think we should
> > be talking about views in that part of documentation.
>
> Well, if we don't, then I'm not sure we can really make this clear.
>
> Anyhow, I've committed the patch to master for now; we can keep
> arguing about what, if anything, to do for back-branch documentation.
>

Thank you for the changes and commit.  Alas, I don't know how to document
this clearly, either.

Cheers,

Jeff

Re: [HACKERS] postgres_fdw super user checks

2017-12-11 Thread Jeff Janes

On Thu, Oct 5, 2017 at 10:49 AM, Simon Riggs  wrote:

> On 4 October 2017 at 18:13, Jeff Janes  wrote:

>
>
> OK.  And if you want the first one, you can wrap it in a view currently,
> but
> > if it were changed I don't know what you would do if you want the 2nd one
> > (other than having every user create their own set of foreign tables).
> So I
> > guess the current situation is more flexible.
>
> Sounds like it would be a useful option on a Foreign Server to allow
> it to run queries as either the invoker or the owner. We have that
> choice for functions, so we already have the concept and syntax
> available. We could have another default at FDW level that specifies
> what the default is for that type of FDW, and if that is not
> specified, we keep it like it currently is.
>

To go further off topic, I'd like to have the invoker vs definer security
options available even for plain old views as well.  Sometimes I want
create a view so that I can let people see, in a controlled manner, things
they couldn't otherwise see.  But more often I just want to provide a
convenience wrapper around ugly SQL without accidentally granting people
additional privileges.

Cheers,

Jeff

Re: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Andres Freund

Hi,

On 2017-12-12 11:57:41 +0800, Craig Ringer wrote:
> TL;DR: Lets add a ProcSignalReason that makes a backend
> call MemoryContextStats when it sees it and a C func that users can use to
> set it on a proc. Sane?

It's not unproblematic. procsignal_sigusr1_handler() runs in a signal
handler, so you can't really rely on a lot of stuff being legal to
do.

It'd be easy to set a flag in the handler and then have
CHECK_FOR_INTERRUPTS() do the MemoryContextStats() call. But that'd have
the disadvanatage that it possibly would take a while till the
MemoryContextStats() is executed. Not sure if that's still good enough
for you?

Another question is whether printing to stderr, bypassing proper
logging!, is good enough for something like this.

Greetings,

Andres Freund

RE: Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Tsunakawa, Takayuki

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> TL;DR: Lets add a ProcSignalReason that makes a backend call
> MemoryContextStats when it sees it and a C func that users can use to set
> it on a proc. Sane?

> So how about borrowing a ProcSignalReason entry for "dump a memory context
> summary at your earliest convenience" ? We could name it a more generic
> "dump debug data" in case we want to add things later.
> 
> Then a new pg_log_debug_backend(int) function or something like that could
> signal the proc and let CHECK_FOR_INTERRUPTS handle calling
> MemoryContextStats next time it's called.

+1
That's one of things I wanted to do.  It will be more useful on Windows.  Would 
it work for autovac processes and background workers, etc. that connect to 
shared memory?

I have also wanted to dump stack traces.  Linux (glibc) has 
backtrace_symbols(), and Windows has StackWalk()/StackWalk64().  Is it sane to 
make the function a hook?

Regards
Takayuki Tsunakawa

Using ProcSignal to get memory context stats from a running backend

2017-12-11 Thread Craig Ringer

Hi all

TL;DR: Lets add a ProcSignalReason that makes a backend
call MemoryContextStats when it sees it and a C func that users can use to
set it on a proc. Sane?


I fairly frequently want to get a memory context dump from a running
backend. It's trivial to do on a debug-enabled build (or one with debug
symbols packages installed) when on a system with gdb and when I'm the one
driving.

Frequently one or more of these things are not true. Users rarely install
debug symbols packages, and never build with --enable-debug if building
from source. They rarely have gdb to hand, and rarely know how to use it if
they do.

So getting memory context dumps from backends showing signs of unreasonable
memory bloat is harder than would be ideal for such incredibly handy debug
info. Especially since most users run with default overcommit on Linux, so
Linux likes to nuke the process rather than let us print context dumps when
we detect OOM.

So how about borrowing a ProcSignalReason entry for "dump a memory context
summary at your earliest convenience" ? We could name it a more generic
"dump debug data" in case we want to add things later.

Then a new pg_log_debug_backend(int) function or something like that could
signal the proc and let CHECK_FOR_INTERRUPTS handle calling
MemoryContextStats next time it's called.

We could clobber a prior ProcSignalReason set on the proc, but that's why
we retry.

Way, way easier than getting gdb and debuginfo in place, attaching, etc.
You still have to go fishing for stderr to find the output, but that's
usually the same place as the rest of the logs.

Barring objections I'll send a patch in a while.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BUG #14941: Vacuum crashes

2017-12-11 Thread Masahiko Sawada

On Mon, Dec 11, 2017 at 2:13 PM, Michael Paquier
 wrote:
> On Wed, Dec 6, 2017 at 1:52 AM, Bossart, Nathan  wrote:
>> 0001_fix_unparenthesized_vacuum_grammar_v1.patch
>>
>> One VacuumStmt grammar rule allows users to specify the VACOPT_ANALYZE
>> option by including an AnalyzeStmt at the end of the command.  This
>> appears to have been added as part of the introduction of the ANALYZE
>> command (f905d65e).  However, this means that users can call VACUUM
>> commands like
>>
>> VACUUM VERBOSE ANALYZE VERBOSE pg_class;
>>
>> Besides being disallowed according to the documentation, I think this
>> may give users the idea that there is a difference between the VERBOSE
>> options for VACUUM versus ANALYZE.  In fact, they are the same option,
>> and specifying it twice is redundant.
>
> Hmm. Yeah. If adding a parenthesis grammar to ANALYZE this is a good
> move for long purposes. If we don't do that now, or at least for the
> purpose of this patch set, then a AnalyzeStmt node embedded directly
> in the grammar of VACUUM is likely to bite in the future.
>
>> This change fixes the unparenthesized VACUUM syntax to disallow the out-
>> of-order VACUUM options as described above.  This is a prerequisite
>> change for opening up VACOPT_NOWAIT to users, as adding it to the
>> grammar as-is would cause statements like
>>
>> VACUUM VERBOSE NOWAIT ANALYZE VERBOSE NOWAIT pg_class;
>>
>> to be allowed.  Since I am also looking to introduce a parenthesized
>> syntax for ANALYZE, this patch would prevent statements like
>
> If you add only a parenthesized grammar of ANALYZE, it seems to me
> that this would not be an allowed query anyway.
>
>> VACUUM VERBOSE ANALYZE (VERBOSE, NOWAIT) pg_class;
>>
>> from being accepted as well.
>
> This could also be equivalent to ANALYZE(VERBOSE, NOWAIT). I can see
> pros and cons by keeping the complete extension of AnalyzeStmt in the
> VACUUM grammar, but as the documentation does not mention VACUUM
> VERBOSE ANALYZE VERBOSE as a legit query I guess that ripping it out
> is not going to annoy many users. Still let's see opinions from other
> folks on -hackers as another approach to the feature you want to add
> here would be to just implement the grammar for VACUUM but *not*
> ANALYZE, but I'd like to think that we still want ANALYZE to be
> supported, and also we want it to be extensible as a different thing
> than what VACUUM is.
>
>> 0002_add_parenthesized_analyze_syntax_v1.patch
>>
>> As a prerequisite to giving users access to the VACOPT_NOWAIT option, I
>> am introducing a parenthesized syntax for ANALYZE that is similar to
>> VACUUM's.  Following VACUUM's example, any new options will be added to
>> this syntax, and the old style will become deprecated.
>>
>> Adding a parenthesized syntax for ANALYZE is not strictly necessary for
>> providing the NOWAIT option, as NOWAIT is already a keyword in gram.y.
>> However, I thought it would be good to match VACUUM (since the command
>> structure is so similar), and this work would be required at some point
>> anyway if ANALYZE ever accepts options that we do not want to make
>> reserved keywords.
>
> Yep, agreed with the contents of this patch.
>
>> 0003_add_nowait_vacuum_option_v1.patch
>>
>> This change provides the existing VACOPT_NOWAIT option to users.  This
>> option was previously only used by autovacuum in special cases, but it
>> seems like a potentially valuable option in other cases as well.  For
>> example, perhaps a user wants to run a nightly VACUUM job that skips
>> tables that cannot be locked due to autovacuum (or something else)
>> already working on it.
>>
>> I chose to use NOWAIT as the option name because it is already a keyword
>> for the LOCK command.
>
> Makes sense to me.
>
>> This patch follows the existing logging precedent added by 11d8d72c and
>> ab6eaee8: if a relation is explicitly specified in the command, a log
>> statement will be emitted if it is skipped.  If the relation is not
>> specified (e.g. "VACUUM FULL;"), no such log statements will be emitted.
>
> +WARNING:  skipping analyze of "test1" --- lock not available
> +step analyze_specified: ANALYZE (NOWAIT) test1, test2;
> +step commit:
> +   COMMIT;
> OK for a WARNING for me in this case. We already do that for the other 
> entries.

I also reviewed the patches.

For 0003 patch, if we do VACUUM (NOWAIT) to whole database we don't
get the above WARNING messages, but I think we should get them. The
reported issue also did VACUUM FULL and REINDEX to whole database.
Especially when VACUUM to whole database the information of skipped
relations would be useful for users.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-11 Thread Amit Kapila

On Tue, Dec 12, 2017 at 9:00 AM, Amit Kapila  wrote:
> On Mon, Dec 11, 2017 at 11:27 PM, Robert Haas  wrote:
>> On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila  
>> wrote:
>>> Okay, see the attached and let me know if that suffices the need?
>>
>> + * Check for unexpected worker death.  This will ensure that if
>> + * the postmaster failed to start the worker, then we don't wait
>> + * for it indefinitely.  For workers that are known to be
>> + * launched, we can rely on their error queue being freed once
>> + * they exit.
>>
>> Hmm.  Is this really true?  What if the worker starts up but then
>> crashes before attaching to the error queue?
>>
>
> If the worker errored out before attaching to the error queue, then we
> can't rely on error queue being freed.  However, in that case, the
> worker status will be BGWH_STOPPED.  I have adjusted the comment
> accordingly.
>

In particular, if the worker crashed (say somebody forcefully killed
the worker), then I think it will lead to a restart of the server.  So
not sure, adding anything about that in the comment will make much
sense.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-11 Thread Amit Kapila

On Mon, Dec 11, 2017 at 11:27 PM, Robert Haas  wrote:
> On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila  wrote:
>> Okay, see the attached and let me know if that suffices the need?
>
> + * Check for unexpected worker death.  This will ensure that if
> + * the postmaster failed to start the worker, then we don't wait
> + * for it indefinitely.  For workers that are known to be
> + * launched, we can rely on their error queue being freed once
> + * they exit.
>
> Hmm.  Is this really true?  What if the worker starts up but then
> crashes before attaching to the error queue?
>

If the worker errored out before attaching to the error queue, then we
can't rely on error queue being freed.  However, in that case, the
worker status will be BGWH_STOPPED.  I have adjusted the comment
accordingly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_parallel_worker_startup_failures_v2.patch
Description: Binary data

Re: That mode-700 check on DATADIR again

2017-12-11 Thread Stephen Frost

Greetings Chapman,

* Chapman Flack (c...@anastigmatix.net) wrote:
> I have, more or less, this classic question:
> 
> https://www.postgresql.org/message-id/4667C403.1070807%40t3go.de

[...]

> So, it seems there's at least one use case where some kind of
> no_really_the_datadir_permissions_are_fine option would be welcome
> to get around a well-intended but sometimes broken check.

There's multiple use-cases for this, and some efforts are being made to
specifically address these cases.

> So it's always a good idea to provide an escape hatch for that kind of
> check.
> 
> Isn't it?

Patches are in the works (the ground-work having been committed earlier
this cycle...) to be more flexible in this area.  The unfortunate part
is that this is all PG11 work at this point, but, with a bit of luck and
some hard work, we'll have this improved for that release.

This effort may not address all use-cases, of course, but the plan is to
at least address standard unix group privileges, to allow a non-root /
non-PG-superuser, to be able to run a file-level backup of PG.  If there
are other reasonable use-cases which still need to be addressed beyond
that, then hopefully we can work out a sensible way to build on what's
been done for those as well.

If you have specific questions or comments on this, I'd suggest chatting
with David Steele, who is working on this, and whom I've CC'd here.

Thanks!

Stephen

signature.asc
Description: Digital signature

That mode-700 check on DATADIR again

2017-12-11 Thread Chapman Flack

I have, more or less, this classic question:

https://www.postgresql.org/message-id/4667C403.1070807%40t3go.de

But I have it for a newer reason, where again it seems as if a better
answer than "don't do that" might be worth having.

1. Suppose you are running PG in a VM (named pgvm just for exposition).

2. Suppose DATADIR in this VM is a direct virtio mount of an LVM
   logical volume on the host machine (vg0/pgdata, say).

3. Suppose you've thought of a quick way to snag a copy of this
   volume to build a replica machine:

   pg_start_backup(...)
   virsh domfsfreeze pgvm /var/lib/pgsql/x.x/data
   lvm lvcreate --snapshot vg/pgdata --name snap_data
   virsh domfsthaw pgvm
   pg_stop_backup()

   and then copy from snap_data at leisure.

4. Suppose the guest OS running in the VM has an SELinux policy
   in force.

Those virsh domfsfreeze / domfsthaw operations work by libvirt
RPC-ing into a qemu-ga (qemu guest agent) process running
in the guest; the agent then opens the given mountpoint path
/var/lib/pgsql/x.x/data O_RDONLY, and does an ioctl(FIFREEZE) on it.

When you see the "failed to open: ... Permission denied" from qemu-ga
(which runs as root!), you pretty much know SELinux is involved.

Turns out it's involved at two levels.

First is the familiar type-enforcement job SELinux does. The
DATADIR's context has type postgresql_db_t, and the policy doesn't
say a process with type virt_qemu_ga_t gets to do open, read, search,
or ioctl on it, root or no root. That's simple, add a policy module with
 allow virt_qemu_ga_t postgresql_db_t:dir { ioctl open read search };
problem solved.

The second level's more of a head-scratcher, because it involves *two*
interacting linuxisms, SELinux and capabilities. It turns out qemu-ga
is run in a mode where capabilities govern, and its root-ness means
nothing special. It can't touch DATADIR (or its parent or grandparent,
for that matter), for the simple reason that they are all
rwx-- postgres postgres, so a poor nonspecial user named root
has no access at all.

That would not be a problem if qemu-ga could exercise the
CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE capabilities, but the
SELinux policy again steps in to deny that.

One fix would be to make another tweak to the SELinux policy,
allowing qemu-ga to use CAP_DAC_READ_SEARCH. But that's overly
broad, and would let qemu-ga read/search everything everywhere.

Really, all that's needed is to add ACLs on DATADIR and its parent
and grandparent, granting rx (for DATADIR) or x (its ancestors) to
the single specific user named root.

And that works, and qemu-ga can freeze and thaw the filesystem,
and all is well.

Until you happen to stop postgres, and then find out it refuses to
restart.

Because the way POSIX ACLs work, after adding a "root may r-x" rule
to DATADIR, it looks like this:

[data]# getfacl .
# file: .
# owner: postgres
# group: postgres
user::rwx
user:root:r-x
group::---
mask::r-x
other::---

Seeing the group::--- and other::--- ought to make postgres happy,
right? There is no 'group' access, and no 'other' access, nothing
but 'rwx' for postgres itself, and 'r-x' for root. This ought to
please even the strictest security zealot.

However, when you stat a file with a POSIX ACL, you get shown the
ACL's 'mask' entry (essentially the ceiling of all the 'extra' ACL
entries) in place of the old-fashioned group bits. So in a
non-ACL-aware ls or stat, the above looks like:

[data]# ls -ld
drwxr-x---+ 22 postgres postgres 4096 Dec 11 18:14 .

... and postgres refuses to start because it mistakes the r-x for
'group' permissions. ACLs have added new semantics to POSIX
permissions, and postgres doesn't understand them when it makes this
hey-don't-shoot-your-foot check.

So, it seems there's at least one use case where some kind of
no_really_the_datadir_permissions_are_fine option would be welcome
to get around a well-intended but sometimes broken check.

And really, this is just a specific example of the general principle
that security checks made in an LBYL rather than EAFP manner are
always at risk of drawing wrong conclusions because they can't be
omniscient about all the factors that apply in every environment.
So it's always a good idea to provide an escape hatch for that kind of
check.

Isn't it?

-Chap

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 05:16, Tom Lane  wrote:

> Peter Eisentraut  writes:
> > On 12/9/17 19:28, Tom Lane wrote:
> >> No, my problem with it is that there are lots of contexts where typing
> >> "help" won't produce anything.  Robert's proposal downthread would
> >> go a long way towards fixing that.
>
> > No opinion on that, but if the problem is that people don't know how to
> > quit psql, then we should just put that information back into the
> > welcome message and be done with it.
>
> I don't think people read the welcome message, or at least they
> immediately forget it.

What welcome message?

$ psqlpsql (9.6.4, server 9.5.8)
Type "help" for help.

craig=>

Oh, huh. There's something written below the version.

OK, you might argue that if I wasn't a constant psql user I wouldn't have
learned to be blind to that yet. But I think many users are probably almost
as blind to welcome messages are they are to banner ads. And lets face it,
the users we're trying to reach aren't necessarily the best readers/problem
solvers already.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 05:38, Chapman Flack  wrote:

> On 12/11/2017 04:16 PM, Tom Lane wrote:
> > Peter Eisentraut  writes:
> >> No opinion on that, but if the problem is that people don't know how to
> >> quit psql, then we should just put that information back into the
> >> welcome message and be done with it.
> >
> > I don't think people read the welcome message, or at least they
> > immediately forget it.
>
> I'm still a wholehearted supporter of Robert's idea in
>
> https://www.postgresql.org/message-id/CA%2BTgmoZswp00PtcgPfQ9zbbh7HUTgs
> LLJ9Z1x9E2s8Y7ep048g%40mail.gmail.com
>
> to simply produce helpful messages to stderr when isatty(stdin) and
> 'exit' or 'quit' arrives on a line by itself.

+1 from me. In part because scripts you write on pg11 won't break on older
versions with weird errors due to lack of 'exit' / 'quit' .

As others noted, 'help' is a precedent here.

I don't even think it's necessarily worthwhile to treat them any
> differently when *not* on a continuation line. Why not just always
> have a bare 'exit' or 'quit', when isatty(stdin), give you a little
> reminder about \?, \q, ^C/\r and leave the next step up to you?
>

This makes me a little nervous... but the counter-argument is that I see a
lot of stack overflow questions with variants on "nothing is working in
psql". Because they didn't use a semicolon. So they're stuck in a
continuation, mashing help and quit and exit and wondering wtf is going on.
The same happens with people who get stuck in quote continuations or
open-parens continuations, where semicolon doesn't help them either.

Experienced users on unixes will try control-D, but unless you've used unix
a while that's not going to occur to you.

I doubt I've ever written just "exit" or "quit" without indentation. I
think if it requires them to be a bareword with no indentation, strictly
^(exit|quit)\n when isatty, then that's probably a safe and helpful choice.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Incorrect debug info printed in generate_partition_wise_join_paths

2017-12-11 Thread Etsuro Fujita


(2017/12/11 17:24), Ashutosh Bapat wrote:

Yes, that's the correct fix. We should be printing debug information
about the child and not the parent.


Thanks for reviewing!

Best regards,
Etsuro Fujita

Re: Testing Extension Upgrade Paths

2017-12-11 Thread Craig Ringer

On 12 December 2017 at 07:25, Michael Paquier 
wrote:

> On Tue, Dec 12, 2017 at 2:55 AM, Jeremy Finzel  wrote:
> > I understand how to setup a regression suite for a postgres extension,
> but
> > what I'm not clear on from the docs is if there is a pattern that exists
> for
> > testing not only the latest version of an extension, but also an upgraded
> > previous version.
> >
> > Is there any straightforward way to do this that doesn't involve manually
> > copying tests?
>
> It is perfectly possible to do tests on a specific version and test
> upgrade paths using CREATE EXTENSION and ALTER EXTENSION which support
> versioning in a SQL regression script, say:
> CREATE EXTENSION foo VERSION "0.1";
> -- Do stuff
> ALTER EXTENSION foo UPDATE TO "0.2";
> -- Do other stuff
>

This works well when you want to test 1.1 binaries with a 1.0 SQL extension
definition.

It's not effective if you need to test 1.0 binaries+extension, then an
upgrade to 1.1 binaries and extension. You have no way to install and run
the 1.0 binaries without going outside pg_regress / TAP and using an
external test harness/framework/script. I've been bitten by this multiple
times before. For example, if 1.1 always assigns a value to some column
that was nullable in 1.0, you're never going to exercise 1.1's handling of
nulls in that column.

It also doesn't lend its self to complex multi-path upgrades, where for
example you could upgrade 1.0.0 -> 1.0.1 -> 1.1.0 or  upgrade directly from
1.0.0 -> 1.1.0. This rapidly becomes an issue when you release 1.0.0,
release 1.1.0, then release a maintenance 1.0.1  version. Now you have
toadd the 1.0.1 -> 1.1.0 upgrade path, but you still have the 1.0.0 ->
1.1.0 path.

You can handle that with TAP if you're willing to write something to do the
various combinations of steps and exercise them. It's not practical with
pg_regress.

More thorough testing benefits from an external harness.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Boolean partitions syntax

2017-12-11 Thread Amit Langote

Hi.

Horiguchi-san pointed out [1] on a nearby thread that the partitioning
syntax (the FOR VALUES clause) doesn't accept true and false as valid
partition bound datums, which seems to me like an oversight.  Attached a
patch to fix that.

create table bools (a bool) partition by list (a);

Before patch:

create table bools_t partition of bools for values in (true);
ERROR:  syntax error at or near "true"
LINE 1: ...reate table bools_t partition of bools for values in (true);

After:

create table bools_t partition of bools for values in (true);
CREATE TABLE
\d bools_t
  Table "public.bools_t"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | boolean |   |  |
Partition of: bools FOR VALUES IN (true)

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/20171128.203915.26713586.horiguchi.kyotaro%40lab.ntt.co.jp
From 85d51048ee147c819a9ed0648557c7f05f3802c5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 12 Dec 2017 10:33:11 +0900
Subject: [PATCH] Allow Boolean values in partition FOR VALUES clause

---
 doc/src/sgml/ref/create_table.sgml |  6 +++---
 src/backend/parser/gram.y  | 18 ++
 src/test/regress/expected/create_table.out | 14 ++
 src/test/regress/sql/create_table.sql  |  7 +++
 4 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index a0c9a6d257..eaa79ae333 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -86,9 +86,9 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] 
TABLE [ IF NOT EXI
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | MINVALUE | 
MAXVALUE } [, ...] ) |
+IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
+FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ebfc94f896..c6cbf9b844 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -153,6 +153,7 @@ static Node *makeBitStringConst(char *str, int location);
 static Node *makeNullAConst(int location);
 static Node *makeAConst(Value *v, int location);
 static Node *makeBoolAConst(bool state, int location);
+static Node *makeBoolAConstNoCast(bool state, int location);
 static RoleSpec *makeRoleSpec(RoleSpecType type, int location);
 static void check_qualified_name(List *names, core_yyscan_t yyscanner);
 static List *check_func_name(List *names, core_yyscan_t yyscanner);
@@ -2768,6 +2769,8 @@ partbound_datum:
Sconst  { $$ = makeStringConst($1, @1); 
}
| NumericOnly   { $$ = makeAConst($1, @1); }
| NULL_P{ $$ = makeNullAConst(@1); }
+   | TRUE_P{ $$ = 
makeBoolAConstNoCast(true, @1); }
+   | FALSE_P   { $$ = 
makeBoolAConstNoCast(false, @1); }
;
 
 partbound_datum_list:
@@ -15590,6 +15593,21 @@ makeBoolAConst(bool state, int location)
return makeTypeCast((Node *)n, SystemTypeName("bool"), -1);
 }
 
+/* makeBoolAConstNoCast()
+ * Create an A_Const string node containing valid bool type values.
+ */
+static Node *
+makeBoolAConstNoCast(bool state, int location)
+{
+   A_Const *n = makeNode(A_Const);
+
+   n->val.type = T_String;
+   n->val.val.str = (state ? "t" : "f");
+   n->location = location;
+
+   return (Node *) n;
+}
+
 /* makeRoleSpec
  * Create a RoleSpec with the given type
  */
diff --git a/src/test/regress/expected/create_table.out 
b/src/test/regress/expected/create_table.out
index 8e745402ae..c541a652c4 100644
--- a/src/test/regress/expected/create_table.out
+++ b/src/test/regress/expected/create_table.out
@@ -863,3 +863,17 @@ Partition key: LIST (a)
 Number of partitions: 0
 
 DROP TABLE parted_col_comment;
+-- boolean partitions
+create table boolspart (a bool) partition by list (a);
+create table boolspart_t partition of boolspart for values in (true);
+create table boolspart_f partition of boolspart for values in (false);
+\d+ boolspart
+ Table "public.boolspart"
+ Column |  Type   | Collation | Nullable | Default | Storage | Stats target | 
Description 
++-+---+--+-+-+--+-
+ a  | boolean |   |  | | plain   |  | 
+Partition k

Re: User defined data types in Logical Replication

2017-12-11 Thread Masahiko Sawada

On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong  wrote:
> On 2017/12/08 13:18, Huong Dangminh wrote:
>
>> Hi Sawada-san,
>>
>>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada 
>>> wrote:

 On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong 
>>>
>>> wrote:
>
> Hi Sawada-san,
>
> Sorry for my late response.
>
> On 2017/12/05 0:11, Masahiko Sawada wrote:
>
> There is one more case that user-defined data type is not supported
> in Logical Replication.
> That is when remote data type's name does not exist in SUBSCRIBE.
>
> In relation.c:logicalrep_typmap_gettypname
> We search OID in syscache by remote's data type name and mapping it,
> if it does not exist in syscache We will be faced with the bellow
> error.
>
>  if (!OidIsValid(entry->typoid))
>  ereport(ERROR,
>
> (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("data type \"%s.%s\" required
> for logical replication does not exist",
>  entry->nspname,
> entry->typname)));
>
> I think, it is not necessary to check typoid here in order to avoid
> above case, is that right?
>
> I think it's not right. We should end up with an error in the case
> where the same type name doesn't exist on subscriber. With your
> proposed patch,
> logicalrep_typmap_gettypname() can return an invalid string
> (entry->typname) in that case, which can be a cause of SEGV.
>
> Thanks, I think you are right.
> # I thought that entry->typname was received from walsender, and it
> is already be qualified in logicalrep_write_typ.
> # But we also need check it in subscriber, because it could be lost
> info in transmission.

 Oops, the last sentence of my previous mail was wrong.
 logicalrep_typmap_gettypname() doesn't return an invalid string since
 entry->typname is initialized with a type name got from wal sender.
>
> Yeah, so we do not need to check the existing of publisher's type name in
> subscriber.

 After more thought, we might not need to raise an error even if there
 is not the same data type on both publisher and subscriber. Because
 data is sent after converted to the text representation and is
 converted to a data type according to the local table definition
 subscribers don't always need to have the same data type. If it's
 right, slot_store_error_callback() doesn't need to find a
 corresponding local data type OID but just finds the corresponding
 type name by remote data type OID. What do you think?
>
> I totally agree. It will make logical replication more flexible with data
> type.

 --- a/src/backend/replication/logical/worker.c
 +++ b/src/backend/replication/logical/worker.c
 @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
 DLIST_STATIC_INIT(lsn_mapping);

   typedef struct SlotErrCallbackArg
   {
 -LogicalRepRelation *rel;
 -intattnum;
 +LogicalRepRelMapEntry *rel;
 +intremote_attnum;
 +intlocal_attnum;
   } SlotErrCallbackArg;

 Since LogicalRepRelMapEntry has a map of local attributes to remote
 ones we don't need to have two attribute numbers.

>> Sorry for the late reply.
>>
>>> Attached the patch incorporated what I have on mind. Please review it.
>>
>> Thanks for the patch, I will do it at this weekend.
>
> Your patch is fine for me.
> But logicalrep_typmap_getid will be unused.

Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
to replicate data to an another data type of column, what is the data
type mapping on subscriber for? It seems enough to have remote data
type oid, namespace and name.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: plpgsql test layout

2017-12-11 Thread Michael Paquier

On Tue, Dec 12, 2017 at 5:11 AM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> On 12/7/17 15:21, Pavel Stehule wrote:
>>> 2017-12-07 20:08 GMT+01:00 Peter Eisentraut
>>> >> >:
 Here is a first attempt.
>
>>> looks ok
>
>> Any other thoughts on this?  If not, I'd like to commit it, give the
>> buildfarm a run at it (looking at the client code, it should be fine),
>> and then rebase some ongoing work on top of it.
>
> No particular objection.  Does the MSVC infrastructure need to be
> taught about it?

If I read vcregress.pl correctly, it seems to me that you need to do
more with MSVC (see plcheck). The tests would kick if sql/ and
expected/ are found, and the test list is fetched by looking at
REGRESSION in the test files. However plpgsql code has an additional
src/ folder which would cause the tests to not execute. If plpgsql
code was moved on folder down then the tests would execute properly.
-- 
Michael

RE: Added PostgreSQL internals learning materials in Developer FAQ

2017-12-11 Thread Tsunakawa, Takayuki

From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> While we're talking wiki changes, some folks may be interested in the recent
> stuff I added to the profiling with perf page
> https://wiki.postgresql.org/wiki/Profiling_with_perf  on using
> postgres's predefined tracepoints, injecting userspace tracepoints,
> capturing variable values at tracepoints, etc.

Wow, wonderful article.  I didn't know the trace point feature of perf.  This 
article should be bookmarked for many developers, especially support staff.  
I'll introduct this page to my colleagues.

Regards
Takayuki Tsunakawa

Re: Testing Extension Upgrade Paths

2017-12-11 Thread Michael Paquier

On Tue, Dec 12, 2017 at 2:55 AM, Jeremy Finzel  wrote:
> I understand how to setup a regression suite for a postgres extension, but
> what I'm not clear on from the docs is if there is a pattern that exists for
> testing not only the latest version of an extension, but also an upgraded
> previous version.
>
> Is there any straightforward way to do this that doesn't involve manually
> copying tests?

It is perfectly possible to do tests on a specific version and test
upgrade paths using CREATE EXTENSION and ALTER EXTENSION which support
versioning in a SQL regression script, say:
CREATE EXTENSION foo VERSION "0.1";
-- Do stuff
ALTER EXTENSION foo UPDATE TO "0.2";
-- Do other stuff

> P.S. is this the right list for extension dev questions ???

You are sure to get answers here, most folks on this list have likely
the experience of working on extensions.
-- 
Michael

money type's overflow handling is woefully incomplete

2017-12-11 Thread Andres Freund

Hi,

The money type has overflow handling in its input function:
Datum
cash_in(PG_FUNCTION_ARGS)
...
Cashnewvalue = (value * 10) - (*s - '0');

if (newvalue / 10 != value)
ereport(ERROR,

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg("value \"%s\" is out of 
range for type %s",
str, "money")));

but not in a lot of other relevant places:

Datum
cash_pl(PG_FUNCTION_ARGS)
{
Cashc1 = PG_GETARG_CASH(0);
Cashc2 = PG_GETARG_CASH(1);
Cashresult;

result = c1 + c2;

PG_RETURN_CASH(result);
}

Same with cash_mi, int8_mul_cash, cash_div_int8, etc.

cash_out doesn't have a plain overflow, but:
if (value < 0)
{
/* make the amount positive for digit-reconstruction loop */
value = -value;

doesn't reliably work if value is PG_INT64_MIN.


This leads to fun like:

postgres[2002][1]=# SELECT '92233720368547758.07'::money+'0.1';
┌─┐
│  ?column?   │
├─┤
│ -$92,233,720,368,547,757.99 │
└─┘



Greetings,

Andres Freund

Re: [HACKERS] static assertions in C++

2017-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/11/17 16:45, Tom Lane wrote:
>> (BTW, why is it that we can't fall back on the negative-width-bitfield
>> trick for old g++?)

> The complaint is
> error: types may not be defined in 'sizeof' expressions

Hmm, well, surely there's more than one way to do that; the sizeof
is just a convenient way to wrap it in C.  Wouldn't a typedef serve
just as well?

(Googling the topic shows that this wheel has been invented
before, BTW.)

regards, tom lane

Re: [HACKERS] static assertions in C++

2017-12-11 Thread Peter Eisentraut

On 12/11/17 16:45, Tom Lane wrote:
>> I guess the question is whether we would rather be able to have users
>> continue to use older C++ compilers, or be super picky about static
>> assertions.
> 
> Uh, what is this "continue to use" bit?  We've never supported
> building the backend with C++ compilers.

The problem is static assertions in inline functions in header files.
We do support using the header files in C++ extensions.  But now there
is a problem that if a C++ extension happens to pull in a header file
that has a static assertion, it doesn't compile anymore, but it used to
before the static assertion was introduced.

(Currently, we have very few static assertions in header files, so the
problem is not relevant in practice yet, but the use of both static
assertions and inline functions is clearly expanding.)

> (BTW, why is it that we can't fall back on the negative-width-bitfield
> trick for old g++?)

The complaint is

error: types may not be defined in 'sizeof' expressions

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] static assertions in C++

2017-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 11/29/17 10:03, Tom Lane wrote:
>> Right now, we have the property that every build enforces static
>> assertions, albeit with variable quality of the error messages.
>> I strongly disagree that it's okay to throw that property away.
>> I do think that we could put an #error here instead, and wait to see
>> if anyone complains before expending effort on a workaround.

> I guess the question is whether we would rather be able to have users
> continue to use older C++ compilers, or be super picky about static
> assertions.

Uh, what is this "continue to use" bit?  We've never supported
building the backend with C++ compilers.  Nor, since the whole
point here is that StaticAssert doesn't work with C++, is there
any existing tradition of building extensions that use StaticAssert
with C++.  So I think it's highly likely that if we put in
an #error there, there will be exactly zero complaints.

I'd much rather continue to be sure that StaticAssert does what it
says on the tin than retroactively improve the compatibility situation
for older compilers.

(BTW, why is it that we can't fall back on the negative-width-bitfield
trick for old g++?)

regards, tom lane

Re: REPLICA IDENTITY FULL

2017-12-11 Thread Peter Eisentraut

On 12/4/17 22:15, Noah Misch wrote:
> While reviewing this patch, I noticed a couple of nearby defects:
> 
> - RelationFindReplTupleSeq() says "Note that this stops on the first matching
>   tuple.", but that's not the case.  It visits every row in the table, and it
>   uses the last match.  The claimed behavior sounds more attractive.

Yeah, that looks like a mistake.

> - RelationFindReplTupleSeq() has comment "/* Start an index scan. */", an
>   inapplicable copy-paste from RelationFindReplTupleByIndex().

I have fixed that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Jsonb transform for pl/python

2017-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/11/17 03:22, Anthony Bykov wrote:
>>> Why not make this part of the plpythonu extension?  It doesn't have to
>>> be a separate contrib module.

>> I thought about that, but the problem is that there will be no
>> possibilities to create custom transform if we create this extension by
>> default.

> OK, could it be a separate extension, but part of the same code directory?

I think our makefile infrastructure only allows for one shlib to be
built per directory.

Admittedly, you could have two extensions sharing the same shlib
(and the same regression test suite), but on the whole it's not clear
to me why we should do that.

regards, tom lane

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Chapman Flack

On 12/11/2017 04:16 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> No opinion on that, but if the problem is that people don't know how to
>> quit psql, then we should just put that information back into the
>> welcome message and be done with it.
> 
> I don't think people read the welcome message, or at least they
> immediately forget it.

I'm still a wholehearted supporter of Robert's idea in

https://www.postgresql.org/message-id/CA%2BTgmoZswp00PtcgPfQ9zbbh7HUTgsLLJ9Z1x9E2s8Y7ep048g%40mail.gmail.com

to simply produce helpful messages to stderr when isatty(stdin) and
'exit' or 'quit' arrives on a line by itself. Simple, clear, you end
up learning how psql works, and getting help doing it, at the moment
you want it. No change to semantics. Brilliant.

I'm not just saying that because its exactly the suggestion I was
preparing to compose when I scrolled to Robert's message and saw
I'd been scooped. (Or maybe I am.)

I don't even think it's necessarily worthwhile to treat them any
differently when *not* on a continuation line. Why not just always
have a bare 'exit' or 'quit', when isatty(stdin), give you a little
reminder about \?, \q, ^C/\r and leave the next step up to you?

Maybe the part of the reminder about ^C / \r to clear the input
buffer could be reserved for when the buffer isn't empty. On second
thought, no, it will be appropriate every time, because now the buffer
has at least your exit or quit in it.

-Chap

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-11 Thread Andres Freund

On 2017-12-11 15:55:42 -0500, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 3:25 PM, Andres Freund  wrote:
> > For me "very short periods of time" and journaled metadatachanging
> > filesystem operations don't quite mesh.  Language lawyering aside, this
> > seems quite likely to bite us down the road.
> >
> > It's imo perfectly fine to say that there's only a limited number of
> > file extension locks, but that there's a far from neglegible chance of
> > conflict even without the array being full doesn't seem nice. Think this
> > needs use some open addressing like conflict handling or something
> > alike.
> 
> I guess we could consider that, but I'm not really convinced that it's
> solving a real problem.  Right now, you start having meaningful chance
> of lock-manager lock contention when the number of concurrent
> processes in the system requesting heavyweight locks is still in the
> single digits, because there are only 16 lock-manager locks.  With
> this, there are effectively 1024 partitions.
> 
> Now I realize you're going to point out, not wrongly, that we're
> contending on the locks themselves rather than the locks protecting
> the locks, and that this makes everything worse because the hold time
> is much longer.

Indeed.


> Fair enough.  On the other hand, what workload would actually be
> harmed?  I think you basically have to imagine a lot of relations
> being extended simultaneously, like a parallel bulk load, and an
> underlying filesystem which performs individual operations slowly but
> scales really well.  I'm slightly skeptical that's how real-world
> filesystems behave.

Or just two independent relations on two different filesystems.


> It might be a good idea, though, to test how parallel bulk loading
> behaves with this patch applied, maybe even after reducing
> N_RELEXTLOCK_ENTS to simulate an unfortunate number of collisions.

Yea, that sounds like a good plan. Measure two COPYs to relations on
different filesystems, reduce N_RELEXTLOCK_ENTS to 1, and measure
performance. Then increase the concurrency of the copies to each
relation.


> This isn't a zero-sum game.  If we add collision resolution, we're
> going to slow down the ordinary uncontended case; the bookkeeping will
> get significantly more complicated.  That is only worth doing if the
> current behavior produces pathological cases on workloads that are
> actually somewhat realistic.

Yea, measuring sounds like a good plan.

Greetings,

Andres Freund

Re: [HACKERS] static assertions in C++

2017-12-11 Thread Peter Eisentraut

On 11/29/17 10:03, Tom Lane wrote:
> Right now, we have the property that every build enforces static
> assertions, albeit with variable quality of the error messages.
> I strongly disagree that it's okay to throw that property away.
> I do think that we could put an #error here instead, and wait to see
> if anyone complains before expending effort on a workaround.

I guess the question is whether we would rather be able to have users
continue to use older C++ compilers, or be super picky about static
assertions.

In the g++ line, the oldest compiler that supports static assertions is
g++-6, and g++-5 doesn't support it.  I think that is recent enough to
be a concern.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Jsonb transform for pl/python

2017-12-11 Thread Peter Eisentraut

On 12/11/17 03:22, Anthony Bykov wrote:
>> Why not make this part of the plpythonu extension?  It doesn't have to
>> be a separate contrib module.
>>
> Hello,
> I thought about that, but the problem is that there will be no
> possibilities to create custom transform if we create this extension by
> default.

OK, could it be a separate extension, but part of the same code directory?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/9/17 19:28, Tom Lane wrote:
>> No, my problem with it is that there are lots of contexts where typing
>> "help" won't produce anything.  Robert's proposal downthread would
>> go a long way towards fixing that.

> No opinion on that, but if the problem is that people don't know how to
> quit psql, then we should just put that information back into the
> welcome message and be done with it.

I don't think people read the welcome message, or at least they
immediately forget it.

regards, tom lane

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Peter Eisentraut

On 12/9/17 19:28, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 12/8/17 10:19, Tom Lane wrote:
>>> Meh.  I never thought that the "help" business was particularly well
>>> thought out, and this proposal isn't better.
> 
>> The perhaps the fix is to revert the "help" thing?
> 
> No, my problem with it is that there are lots of contexts where typing
> "help" won't produce anything.  Robert's proposal downthread would
> go a long way towards fixing that.

No opinion on that, but if the problem is that people don't know how to
quit psql, then we should just put that information back into the
welcome message and be done with it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-11 Thread Masahiko Sawada

On Tue, Dec 12, 2017 at 5:15 AM, Robert Haas  wrote:
> On Sun, Dec 10, 2017 at 11:51 PM, Masahiko Sawada  
> wrote:
>> Attached updated version patch. Please review it.
>
> I went over this today; please find attached an updated version which
> I propose to commit.
>
> Changes:
>
> - Various formatting fixes, including running pgindent.
>
> - Various comment updates.
>
> - Make RELEXT_WAIT_COUNT_MASK equal RELEXT_LOCK_BIT - 1 rather than
> some unnecessarily smaller number.
>
> - In InitRelExtLocks, don't bother using mul_size; we already know it
> won't overflow, because we did the same thing in RelExtLockShmemSize.
>
> - When we run into an error trying to release a lock, log it as a
> WARNING and don't mark it as translatable.  Follows lock.c.  An ERROR
> here probably just recurses infinitely.
>
> - Don't bother passing OID to RelExtLockRelease.
>
> - Reorder functions a bit for (IMHO) better clarity.
>
> - Make UnlockRelationForExtension just use a single message for both
> failure modes.  They are closely-enough related that I think that's
> fine.
>
> - Make WaitForRelationExtensionLockToBeFree complain if we already
> hold an extension lock.
>
> - In RelExtLockCleanup, clear held_relextlock.waiting.  This would've
> made for a nasty bug.
>
> - Also in that function, assert that we don't hold both a lock and a wait 
> count.
>

Thank you for updating the patch. Here is two minor comments.

+ * we acquire the same relation extension lock repeatedly.  nLocks is 0 is the
+ * number of times we've acquired that lock;

Should it be "nLocks is the number of times we've acquired that lock:"?

+/* Remember lock held by this backend */
+held_relextlock.relid = relid;
+held_relextlock.lock = relextlock;
+held_relextlock.nLocks = 1;

We set held_relextlock.relid and held_relextlock.lock again. Can we remove them?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 3:25 PM, Andres Freund  wrote:
> For me "very short periods of time" and journaled metadatachanging
> filesystem operations don't quite mesh.  Language lawyering aside, this
> seems quite likely to bite us down the road.
>
> It's imo perfectly fine to say that there's only a limited number of
> file extension locks, but that there's a far from neglegible chance of
> conflict even without the array being full doesn't seem nice. Think this
> needs use some open addressing like conflict handling or something
> alike.

I guess we could consider that, but I'm not really convinced that it's
solving a real problem.  Right now, you start having meaningful chance
of lock-manager lock contention when the number of concurrent
processes in the system requesting heavyweight locks is still in the
single digits, because there are only 16 lock-manager locks.  With
this, there are effectively 1024 partitions.

Now I realize you're going to point out, not wrongly, that we're
contending on the locks themselves rather than the locks protecting
the locks, and that this makes everything worse because the hold time
is much longer.  Fair enough.  On the other hand, what workload would
actually be harmed?  I think you basically have to imagine a lot of
relations being extended simultaneously, like a parallel bulk load,
and an underlying filesystem which performs individual operations
slowly but scales really well.  I'm slightly skeptical that's how
real-world filesystems behave.

It might be a good idea, though, to test how parallel bulk loading
behaves with this patch applied, maybe even after reducing
N_RELEXTLOCK_ENTS to simulate an unfortunate number of collisions.

This isn't a zero-sum game.  If we add collision resolution, we're
going to slow down the ordinary uncontended case; the bookkeeping will
get significantly more complicated.  That is only worth doing if the
current behavior produces pathological cases on workloads that are
actually somewhat realistic.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Inconsistency in plpgsql's error context reports

2017-12-11 Thread Tom Lane

I wrote:
> Robert Haas  writes:
>> On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane  wrote:
>>> There seem to be two ways we could look at this.  One is that the
>>> new test case just needs to be rejiggered to avoid unstable output
>>> ("\set VERBOSITY terse" would be the easiest way).  But there is
>>> also room to argue that it's bad that plpgsql produces error reports
>>> that vary depending on the phase of the moon, which is pretty much
>>> what this would look like in the field --- cache flushes will occur
>>> unpredictably in most application environments.

>> I am inclined toward the latter view.

> Yeah, me too.  I'll see about patching exec_eval_simple_expr() to
> provide a context line that matches SPI's.  Seems like a HEAD-only
> change though, as this will result in visible behavior change in
> the typical case.

Here's a quick hack at that.  I guess the main question that needs to be
asked is whether we're happy with plpgsql getting so much chattier
(as per all the regression test changes).

If we're not, the alternative that could be considered is to make SPI
expose some way to suppress its use of a context callback, and change
plpgsql to invoke that when dealing with an expression.  That would
be rather more invasive code-wise, but would largely eliminate the
behavioral change as seen by users.

Another angle, if we do keep it like this, is that maybe SPI should
export _SPI_error_callback so that plpgsql can use it directly,
rather than having a copy that needs to be kept in sync.

regards, tom lane

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index fa4d573..a7c399b 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*** static int exec_run_select(PLpgSQL_execs
*** 266,271 
--- 266,272 
  PLpgSQL_expr *expr, long maxtuples, Portal *portalP);
  static int exec_for_query(PLpgSQL_execstate *estate, PLpgSQL_stmt_forq *stmt,
  			   Portal portal, bool prefetch_ok);
+ static void simple_expr_error_callback(void *arg);
  static ParamListInfo setup_param_list(PLpgSQL_execstate *estate,
   PLpgSQL_expr *expr);
  static ParamListInfo setup_unshared_param_list(PLpgSQL_execstate *estate,
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5504,5509 
--- 5505,5511 
  	ExprContext *econtext = estate->eval_econtext;
  	LocalTransactionId curlxid = MyProc->lxid;
  	CachedPlan *cplan;
+ 	ErrorContextCallback simpleerrcontext;
  	ParamListInfo paramLI;
  	void	   *save_setup_arg;
  	MemoryContext oldcontext;
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5579,5584 
--- 5581,5594 
  	}
  
  	/*
+ 	 * Also setup error traceback support for ereport()
+ 	 */
+ 	simpleerrcontext.callback = simple_expr_error_callback;
+ 	simpleerrcontext.arg = expr;
+ 	simpleerrcontext.previous = error_context_stack;
+ 	error_context_stack = &simpleerrcontext;
+ 
+ 	/*
  	 * Set up ParamListInfo to pass to executor.  We need an unshared list if
  	 * it's going to include any R/W expanded-object pointer.  For safety,
  	 * save and restore estate->paramLI->parserSetupArg around our use of the
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5612,5617 
--- 5622,5629 
  
  	estate->paramLI->parserSetupArg = save_setup_arg;
  
+ 	error_context_stack = simpleerrcontext.previous;
+ 
  	if (!estate->readonly_func)
  		PopActiveSnapshot();
  
*** exec_eval_simple_expr(PLpgSQL_execstate 
*** 5628,5633 
--- 5640,5674 
  	return true;
  }
  
+ /*
+  * simple_expr_error_callback
+  *
+  * Add context information when a simple expression fails.  This should
+  * match the behavior of spi.c's _SPI_error_callback, else users will see
+  * different behavior for errors in simple and not-simple expressions.
+  */
+ static void
+ simple_expr_error_callback(void *arg)
+ {
+ 	PLpgSQL_expr *expr = (PLpgSQL_expr *) arg;
+ 	const char *query = expr->query;
+ 	int			syntaxerrposition;
+ 
+ 	/*
+ 	 * If there is a syntax error position, convert to internal syntax error;
+ 	 * otherwise treat the query as an item of context stack
+ 	 */
+ 	syntaxerrposition = geterrposition();
+ 	if (syntaxerrposition > 0)
+ 	{
+ 		errposition(0);
+ 		internalerrposition(syntaxerrposition);
+ 		internalerrquery(query);
+ 	}
+ 	else
+ 		errcontext("SQL statement \"%s\"", query);
+ }
+ 
  
  /*
   * Create a ParamListInfo to pass to SPI
diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out
index 26f6e43..4a384b0 100644
*** a/src/test/regress/expected/plpgsql.out
--- b/src/test/regress/expected/plpgsql.out
*** DETAIL:  Key (name)=(PF1_1) already exis
*** 1519,1536 
--- 1519,1540 
  update PSlot set backlink = 'WS.not.there' where slotname = 'PS.base.a1';
  ERROR:  WS.not.there does not exist
  CONTEXT:  PL/pgSQL function tg_backlink_set(character,character) line 30 at RAISE
+ SQL statement "SELECT tg_backlink_set(new.backlink, new.slo

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2017-12-11 Thread Alexander Korotkov

On Fri, Dec 8, 2017 at 2:50 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Dec 7, 2017 at 8:59 PM, Robert Haas  wrote:
>
>> On Tue, Nov 7, 2017 at 7:51 AM, Jan Przemysław Wójcik
>>  wrote:
>> > I'm afraid that creating a function that implements quite different
>> > algorithms depending on a global parameter seems very hacky and would
>> lead
>> > to misunderstandings. I do understand the need of backward
>> compatibility,
>> > but I'd opt for the lesser evil. Perhaps a good idea would be to change
>> the
>> > name to 'substring_similarity()' and introduce the new function
>> > 'word_similarity()' later, for example in the next major version
>> release.
>>
>> That breaks things for everybody using word_similarity() currently.
>> If the previous discussion of this topic concluded that
>> word_similarity() was an OK name despite being a slight misnomer, I
>> don't think we should change our mind now.  Instead the new function
>> can be called something which makes the difference clear, e.g.
>> strict_word_similarity(), and the old function can remain as it is.
>
>
> +1
> Thank you for pointing this.  Yes, it would be better not to change
> existing names and behavior, but adjust documentation and add alternative
> behavior with another name.
> Therefore, I'm going to provide patchset of two patches:
> 1) Improve word_similarity() documentation.
> 2) Add new function strict_word_similarity() (or whatever better name we
> invent).
>

Please, find patchset attached.

0001-pg-trgm-word-similarity-docs-improvement.patch – contains improvement
to documentation of word_similarity() and related operators.  I decided to
give formal definition first (what exactly it internally does), and then
example and some more human-understandable description.  This patch also
adjusts two comments where lower and upper bounds mess up.

0002-pg-trgm-strict_word-similarity.patch – implementation of
strict_word_similarity() with comments, docs and tests.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-pg-trgm-word-similarity-docs-improvement.patch
Description: Binary data


0002-pg-trgm-strict_word-similarity.patch
Description: Binary data

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-11 Thread Andres Freund

Hi,

On 2017-12-11 15:15:50 -0500, Robert Haas wrote:
> +* Relation extension locks. Only one process can extend a relation at
> +a time; we use a specialized lock manager for this purpose, which is
> +much simpler than the regular lock manager.  It is similar to the
> +lightweight lock mechanism, but is ever simpler because there is only
> +one lock mode and only one lock can be taken at a time. A process holding
> +a relation extension lock is interruptible, unlike a process holding an
> +LWLock.


> +/*-
> + *
> + * extension_lock.c
> + * Relation extension lock manager
> + *
> + * This specialized lock manager is used only for relation extension
> + * locks.  Unlike the heavyweight lock manager, it doesn't provide
> + * deadlock detection or group locking.  Unlike lwlock.c, extension lock
> + * waits are interruptible.  Unlike both systems, there is only one lock
> + * mode.
> + *
> + * False sharing is possible.  We have a fixed-size array of locks, and
> + * every database OID/relation OID combination is mapped to a slot in
> + * the array.  Therefore, if two processes try to extend relations that
> + * map to the same array slot, they will contend even though it would
> + * be OK to let both proceed at once.  Since these locks are typically
> + * taken only for very short periods of time, this doesn't seem likely
> + * to be a big problem in practice.  If it is, we could make the array
> + * bigger.

For me "very short periods of time" and journaled metadatachanging
filesystem operations don't quite mesh.  Language lawyering aside, this
seems quite likely to bite us down the road.

It's imo perfectly fine to say that there's only a limited number of
file extension locks, but that there's a far from neglegible chance of
conflict even without the array being full doesn't seem nice. Think this
needs use some open addressing like conflict handling or something
alike.

Greetings,

Andres Freund

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-11 Thread Robert Haas

On Sun, Dec 10, 2017 at 11:51 PM, Masahiko Sawada  wrote:
> Attached updated version patch. Please review it.

I went over this today; please find attached an updated version which
I propose to commit.

Changes:

- Various formatting fixes, including running pgindent.

- Various comment updates.

- Make RELEXT_WAIT_COUNT_MASK equal RELEXT_LOCK_BIT - 1 rather than
some unnecessarily smaller number.

- In InitRelExtLocks, don't bother using mul_size; we already know it
won't overflow, because we did the same thing in RelExtLockShmemSize.

- When we run into an error trying to release a lock, log it as a
WARNING and don't mark it as translatable.  Follows lock.c.  An ERROR
here probably just recurses infinitely.

- Don't bother passing OID to RelExtLockRelease.

- Reorder functions a bit for (IMHO) better clarity.

- Make UnlockRelationForExtension just use a single message for both
failure modes.  They are closely-enough related that I think that's
fine.

- Make WaitForRelationExtensionLockToBeFree complain if we already
hold an extension lock.

- In RelExtLockCleanup, clear held_relextlock.waiting.  This would've
made for a nasty bug.

- Also in that function, assert that we don't hold both a lock and a wait count.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


extension-lock-v12.patch
Description: Binary data

ML-based indexing ("The Case for Learned Index Structures", a paper from Google)

2017-12-11 Thread Nikolay Samokhvalov

Very interesting read: https://arxiv.org/abs/1712.01208

HN discussion: https://news.ycombinator.com/item?id=15894896

Some of the comments (from Twitter
https://twitter.com/schrockn/status/940037656494317568): "Jeff Dean and co
at GOOG just released a paper showing how machine-learned indexes can
replace B-Trees, Hash Indexes, and Bloom Filters. Execute 3x faster than
B-Trees, 10-100x less space. Executes on GPU, which are getting faster
unlike CPU. Amazing."

Can those ideas be applied to Postgres in its current state? Or it's not
really down-to-earth?

Re: plpgsql test layout

2017-12-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 12/7/17 15:21, Pavel Stehule wrote:
>> 2017-12-07 20:08 GMT+01:00 Peter Eisentraut
>> > >:
>>> Here is a first attempt.

>> looks ok

> Any other thoughts on this?  If not, I'd like to commit it, give the
> buildfarm a run at it (looking at the client code, it should be fine),
> and then rebase some ongoing work on top of it.

No particular objection.  Does the MSVC infrastructure need to be
taught about it?

regards, tom lane

Re: [HACKERS] Transaction control in procedures

2017-12-11 Thread Peter Eisentraut

On 12/7/17 18:47, Andrew Dunstan wrote:
> Referring to anonymous blocks might be a bit mystifying for some
> readers, unless we note that they are invoked via DO.

Added parenthetical comments.

> I think this sentence should probably be worded a bit differently:
> 
> (And of course, BEGIN and END have different meanings in PL/pgSQL.)
> 
> Perhaps "Note that" instead of "And of course,".

fixed

> Why aren't the SPI functions that error out or return 0 just void
> instead of returning an int?

Tried to align then with existing functions, but I agree it seems weird.
 Changed to return void.

> The docs say SPI_set_non_atomic() returns 0 on success, but the code
> says otherwise.

Fixed the documentation.

> This sentence in the comment in SPI_Commit() is slightly odd:
> 
> This restriction is particular to PLs implemented on top of SPI.
> 
> Perhaps "required by" rather than "particular to" would make it read better.

fixed

> The empty free_commit() and free_rollback() functions in pl_funcs.c look
> slightly odd. I realize that the compiler should optimize the calls
> away, but it seems an odd style.

That's how the existing code for other statements looks as well.

> One other thing I wondered about was what if a PL function (say plperl)
> used SPI to open an explicit cursor and then looped over it? If there
> were a commit or rollback inside that loop we wouldn't have the same
> protections we have in plpgsql, ISTM. I haven't tried this yet, so I'm
> just speculating about what might happen.

Good point.  I added test cases similar to the plpgsql tests to the
other three languages, which not-so-amusingly gave three different
outcomes.  In PL/Perl in particular, the commit clears away the portal,
and the next call to spi_fetchrow() will then not find the cursor and
just return undefined.  So that is not so nice.  I'm thinking about
extending the portal pinning mechanism to the other languages as well,
which seems mildly useful independent of transaction management.  I will
propose a patch for that soon.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: plpgsql test layout

2017-12-11 Thread Peter Eisentraut

On 12/7/17 15:21, Pavel Stehule wrote:
> 2017-12-07 20:08 GMT+01:00 Peter Eisentraut
>  >:
> 
> On 11/14/17 11:51, Pavel Stehule wrote:
> >     One option would be to create a new test setup under 
> src/pl/pgsql(/src)
> >     and move some of the test material from the main test suite there.  
> Some
> >     of the test cases in the main test suite are more about SPI and 
> triggers
> >     and such, so it makes sense to keep these in the main line.  Of 
> course
> >     finding the cut-off might be hard.  Or maybe we'll just start with 
> new
> >     stuff from now on.
> >
> >     Any thoughts?
> >
> > +1
> 
> Here is a first attempt.
> 
> 
> looks ok

Any other thoughts on this?  If not, I'd like to commit it, give the
buildfarm a run at it (looking at the client code, it should be fine),
and then rebase some ongoing work on top of it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Tomas Vondra

Hi,

I see there's an ongoing discussion about the syntax and ALTER TABLE
behavior when changing a compression method for a column. So the patch
seems to be on the way to be ready in the January CF, I guess.

But let me play the devil's advocate for a while and question the
usefulness of this approach to compression. Some of the questions were
mentioned in the thread before, but I don't think they got the attention
they deserve.

FWIW I don't know the answers, but I think it's important to ask them.
Also, apologies if this post looks to be against the patch - that's part
of the "devil's advocate" thing.


The main question I'm asking myself is what use cases the patch
addresses, and whether there is a better way to do that. I see about
three main use-cases:

1) Replacing the algorithm used to compress all varlena types (in a way
that makes it transparent for the data type code).

2) Custom datatype-aware compression (e.g. the tsvector).

3) Custom datatype-aware compression with additional column-specific
metadata (e.g. the jsonb with external dictionary).

Now, let's discuss those use cases one by one, and see if there are
simpler (or better in some way) solutions ...


Replacing the algorithm used to compress all varlena values (in a way
that makes it transparent for the data type code).
--

While pglz served us well over time, it was repeatedly mentioned that in
some cases it becomes the bottleneck. So supporting other state of the
art compression algorithms seems like a good idea, and this patch is one
way to do that.

But perhaps we should simply make it an initdb option (in which case the
whole cluster would simply use e.g. lz4 instead of pglz)?

That seems like a much simpler approach - it would only require some
./configure options to add --with-lz4 (and other compression libraries),
an initdb option to pick compression algorithm, and probably noting the
choice in cluster controldata.

No dependencies tracking, no ALTER TABLE issues, etc.

Of course, it would not allow using different compression algorithms for
different columns (although it might perhaps allow different compression
level, to some extent).

Conclusion: If we want to offer a simple cluster-wide pglz alternative,
perhaps this patch is not the right way to do that.


Custom datatype-aware compression (e.g. the tsvector)
--

Exploiting knowledge of the internal data type structure is a promising
way to improve compression ratio and/or performance.

The obvious question of course is why shouldn't this be done by the data
type code directly, which would also allow additional benefits like
operating directly on the compressed values.

Another thing is that if the datatype representation changes in some
way, the compression method has to change too. So it's tightly coupled
to the datatype anyway.

This does not really require any new infrastructure, all the pieces are
already there.

In some cases that may not be quite possible - the datatype may not be
flexible enough to support alternative (compressed) representation, e.g.
because there are no bits available for "compressed" flag, etc.

Conclusion: IMHO if we want to exploit the knowledge of the data type
internal structure, perhaps doing that in the datatype code directly
would be a better choice.


Custom datatype-aware compression with additional column-specific
metadata (e.g. the jsonb with external dictionary).
--

Exploiting redundancy in multiple values in the same column (instead of
compressing them independently) is another attractive way to help the
compression. It is inherently datatype-aware, but currently can't be
implemented directly in datatype code as there's no concept of
column-specific storage (e.g. to store dictionary shared by all values
in a particular column).

I believe any patch addressing this use case would have to introduce
such column-specific storage, and any solution doing that would probably
need to introduce the same catalogs, etc.

The obvious disadvantage of course is that we need to decompress the
varlena value before doing pretty much anything with it, because the
datatype is not aware of the compression.

So I wonder if the patch should instead provide infrastructure for doing
that in the datatype code directly.

The other question is if the patch should introduce some infrastructure
for handling the column context (e.g. column dictionary). Right now,
whoever implements the compression has to implement this bit too.



regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Dec 11, 2017 at 12:36 PM, Tom Lane  wrote:
>> [ thinks... ]  If we wanted to go that way, one thing we could do to
>> help extension authors (and ourselves) is to define the proposed
>> AllocSetContextCreate macro to include
>> 
>>  StaticAssertExpr(__builtin_constant_p(name))
>> 
>> on compilers that have __builtin_constant_p.  Now, that only helps
>> people using gcc and gcc-alikes, but that's a large fraction of
>> developers I should think.  (I tested this and it does seem to
>> correctly recognize string literals as constants.)

> I like that idea.  I think that would provide good protection not only
> for third-party developers but for core developers.

It turns out this is slightly more painful than I'd anticipated.
I tried to #define AllocSetContextCreate with five parameters,
but all of the call sites that use the size abstraction macros
(ALLOCSET_DEFAULT_SIZES and friends) blew up, because as far as
they were concerned there were only three parameters, since the
abstraction macros hadn't gotten expanded yet.

We can make it work by #defining AllocSetContextCreate with three
parameters

#define AllocSetContextCreate(parent, name, allocparams) ...

This approach means that you *must* use an abstraction macro when going
through AllocSetContextCreate; if you want to write out the parameters
longhand, you have to call AllocSetContextCreateExtended.  I do not
feel that this is a big loss, but there were half a dozen sites in our
code that were doing it the old way.  More significantly, since we
only introduced those macros in 9.6, I suspect that most extensions
are still doing it the old way and will get broken by this change.
It's not hard to fix, but the annoyance factor will probably be real.
I see no good way around it though: we can't use a static inline
function instead, because that will almost certainly break the
__builtin_constant_p test.

I did not bother with compatibility macros for SlabContext or
GenerationContext --- I really doubt any extension code is using
the former yet, and they couldn't be using the latter since it's new.

I've not done any benchmarking on this yet, just confirmed that it
compiles and passes check-world.

regards, tom lane

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 868c14e..adbbc44 100644
*** a/contrib/amcheck/verify_nbtree.c
--- b/contrib/amcheck/verify_nbtree.c
*** bt_check_every_level(Relation rel, bool 
*** 295,303 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_MINSIZE,
!  ALLOCSET_DEFAULT_INITSIZE,
!  ALLOCSET_DEFAULT_MAXSIZE);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
--- 295,301 
  	/* Create context for page */
  	state->targetcontext = AllocSetContextCreate(CurrentMemoryContext,
   "amcheck context",
!  ALLOCSET_DEFAULT_SIZES);
  	state->checkstrategy = GetAccessStrategy(BAS_BULKREAD);

  	/* Get true root block from meta-page */
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 046898c..e93d740 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*** AtStart_Memory(void)
*** 997,1007 
  	 */
  	if (TransactionAbortContext == NULL)
  		TransactionAbortContext =
! 			AllocSetContextCreate(TopMemoryContext,
!   "TransactionAbortContext",
!   32 * 1024,
!   32 * 1024,
!   32 * 1024);

  	/*
  	 * We shouldn't have a transaction context already.
--- 997,1008 
  	 */
  	if (TransactionAbortContext == NULL)
  		TransactionAbortContext =
! 			AllocSetContextCreateExtended(TopMemoryContext,
! 		  "TransactionAbortContext",
! 		  0,
! 		  32 * 1024,
! 		  32 * 1024,
! 		  32 * 1024);

  	/*
  	 * We shouldn't have a transaction context already.
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dd4a8d3..6e27856 100644
*** a/src/backend/catalog/partition.c
--- b/src/backend/catalog/partition.c
*** RelationBuildPartitionDesc(Relation rel)
*** 513,521 
  	}

  	/* Now build the actual relcache partition descriptor */
! 	rel->rd_pdcxt = AllocSetContextCreate(CacheMemoryContext,
! 		  RelationGetRelationName(rel),
! 		  ALLOCSET_DEFAULT_SIZES);
  	oldcxt = MemoryContextSwitchTo(rel->rd_pdcxt);

  	result = (PartitionDescData *) palloc0(sizeof(PartitionDescData));
--- 513,522 
  	}

  	/* Now build the actual relcache partition descriptor */
! 	rel->rd_pdcxt = AllocSetContextCreateExtended(CacheMemoryContext,
!   RelationGetRelationName(rel),
!   MEMCONTEXT_OPTION_COPY_NAME,
!   ALLOCSET_DEFAULT_SIZES);
  	oldcxt = MemoryContextSwitchTo(rel->rd

Re: Rethinking MemoryContext creation

2017-12-11 Thread Andres Freund

On 2017-12-11 13:09:42 -0500, Tom Lane wrote:
> Tomas Vondra  writes:
> > On 12/10/2017 04:42 PM, Tom Lane wrote:
> >> Peter Geoghegan  writes:
> >>> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
>  Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
>  although that number is a bit shaky since the run-to-run variation
>  is a few percent anyway.
>
> > FWIW I've done some measurements, and while there is a improvement, it's
> > far from 5%. ...
> > So that's about 1.3% and 1.2% improvement. It seems fairly consistent,
> > but it might easily be due to different in layout of the binaries.
>
> Thanks for checking.  With these sorts of small-percentage improvements,
> I would not be surprised for platform-to-platform results to be different.
> At least you do see some improvement too.
>
> Let me code up the change to avoid copying constant name strings,
> and then we can see if that helps any.

FWIW:
+5.37%  postgres  postgres[.] hash_search_with_hash_value
+2.94%  postgres  postgres[.] AllocSetAlloc
+2.68%  postgres  postgres[.] _bt_compare
+2.36%  postgres  postgres[.] LWLockAcquire
+2.13%  postgres  postgres[.] PostgresMain
+1.47%  postgres  postgres[.] LWLockRelease
+1.32%  postgres  libc-2.25.so[.] _int_malloc
+1.23%  postgres  libc-2.25.so[.] vfprintf
+1.22%  postgres  postgres[.] LockAcquire
+1.20%  postgres  postgres[.] _bt_first
-1.11%  postgres  libc-2.25.so[.] strlen
   - strlen
  + 32.24% MemoryContextCreate
  + 16.97% pq_getmsgstring
  + 14.63% set_ps_display

So I'd expect it to help a small amount.

- Andres

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

Tomas Vondra  writes:
> On 12/10/2017 04:42 PM, Tom Lane wrote:
>> Peter Geoghegan  writes:
>>> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
 Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
 although that number is a bit shaky since the run-to-run variation
 is a few percent anyway.

> FWIW I've done some measurements, and while there is a improvement, it's
> far from 5%. ...
> So that's about 1.3% and 1.2% improvement. It seems fairly consistent,
> but it might easily be due to different in layout of the binaries.

Thanks for checking.  With these sorts of small-percentage improvements,
I would not be surprised for platform-to-platform results to be different.
At least you do see some improvement too.

Let me code up the change to avoid copying constant name strings,
and then we can see if that helps any.

regards, tom lane

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Alexander Korotkov

On Mon, Dec 11, 2017 at 8:46 PM, Robert Haas  wrote:

> On Mon, Dec 11, 2017 at 12:41 PM, Alexander Korotkov
>  wrote:
> > Thus, in your example if user would like to further change awesome
> > compression for evenbetter compression, she should write.
> >
> > SET COMPRESSION evenbetter PRESERVE pglz, awesome; -- full list of
> previous
> > compression methods
>
> Right.
>
> > I wonder what should we do if user specifies only part of previous
> > compression methods?  For instance, pglz is specified but awesome is
> > missing.
> >
> > SET COMPRESSION evenbetter PRESERVE pglz; -- awesome is missing
> >
> > I think we should trigger an error in this case.  Because query is
> specified
> > in form that is assuming to work without table rewrite, but we're unable
> to
> > do this without table rewrite.
>
> I think that should just rewrite the table in that case.  PRESERVE
> should specify the things that are allowed to be preserved -- its mere
> presence should not be read to preclude a rewrite.  And it's
> completely reasonable for someone to want to do this, if they are
> thinking about de-installing awesome.
>

OK, but NOTICE that presumably unexpected table rewrite takes place could
be still useful.

Also we probably should add some view that will expose compression methods
whose are currently preserved for columns.  So that user can correctly
construct SET COMPRESSION query that doesn't rewrites table without digging
into internals (like directly querying pg_depend).

> I also think that we need some way to change compression method for
> multiple
> > columns in a single table rewrite.  Because it would be way more
> efficient
> > than rewriting table for each of columns.  So as an alternative of
> >
> > ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome; -- first table
> > rewrite
> > ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome; -- second table
> > rewrite
> >
> > we could also provide
> >
> > ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome PRESERVE pglz;
> -- no
> > rewrite
> > ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome PRESERVE pglz;
> -- no
> > rewrite
> > VACUUM FULL tbl RESET COMPRESSION PRESERVE c1, c2; -- rewrite with
> > recompression of c1 and c2 and removing depedencies
> >
> > ?
>
> Hmm.  ALTER TABLE allows multi comma-separated subcommands, so I don't
> think we need to drag VACUUM into this.  The user can just say:
>
> ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome, ALTER COLUMN
> c2 SET COMPRESSION awesome;
>
> If this is properly integrated into tablecmds.c, that should cause a
> single rewrite affecting both columns.


OK.  Sorry, I didn't notice we can use multiple subcommands for ALTER TABLE
in this case...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tomas Vondra


On 12/10/2017 04:42 PM, Tom Lane wrote:
> Peter Geoghegan  writes:
>> On Sat, Dec 9, 2017 at 5:53 PM, Tom Lane  wrote:
>>> Overall I'm seeing about a 5% improvement in a "pgbench -S" scenario,
>>> although that number is a bit shaky since the run-to-run variation
>>> is a few percent anyway.
> 
>> Is that with "-M prepared", too?
> 
> No, I didn't use that.
> 

FWIW I've done some measurements, and while there is a improvement, it's
far from 5%.

pgbench -S -c 1 -T 60

   master   patched
   -
18244 18534
18369 18587
18310 18479
18346 18515
18344 18557

pgbench -S -M prepared -c 1 -T 60

   master   patched
   -
35191 35231
35115 3
35164 35686
35110 35724
35053 35762

So that's about 1.3% and 1.2% improvement. It seems fairly consistent,
but it might easily be due to different in layout of the binaries.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-11 Thread Robert Haas

On Sun, Dec 10, 2017 at 11:07 PM, Amit Kapila  wrote:
> Okay, see the attached and let me know if that suffices the need?

+ * Check for unexpected worker death.  This will ensure that if
+ * the postmaster failed to start the worker, then we don't wait
+ * for it indefinitely.  For workers that are known to be
+ * launched, we can rely on their error queue being freed once
+ * they exit.

Hmm.  Is this really true?  What if the worker starts up but then
crashes before attaching to the error queue?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Testing Extension Upgrade Paths

2017-12-11 Thread Jeremy Finzel

Hello -

I understand how to setup a regression suite for a postgres extension, but
what I'm not clear on from the docs is if there is a pattern that exists
for testing not only the latest version of an extension, but also an
upgraded previous version.

For example, I am developing version 1.1, and the test suite runs fine for
1.1.

But I want to install the extension at 1.0, do a few things, then upgrade
to 1.1 and run N tests over again from this path.

I have in mind something like an environment variable or something where
you could run the suite twice with the variation being a direct install at
the highest version, or an upgrade from a previous version.  Please excuse
my ignorance!

Is there any straightforward way to do this that doesn't involve manually
copying tests?

Thank you!
Jeremy

P.S. is this the right list for extension dev questions ???

Re: Mention ordered datums in PartitionBoundInfoData comment

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 7:28 AM, Ashutosh Bapat
 wrote:
> PFA updated patch.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 12:41 PM, Alexander Korotkov
 wrote:
> Thus, in your example if user would like to further change awesome
> compression for evenbetter compression, she should write.
>
> SET COMPRESSION evenbetter PRESERVE pglz, awesome; -- full list of previous
> compression methods

Right.

> I wonder what should we do if user specifies only part of previous
> compression methods?  For instance, pglz is specified but awesome is
> missing.
>
> SET COMPRESSION evenbetter PRESERVE pglz; -- awesome is missing
>
> I think we should trigger an error in this case.  Because query is specified
> in form that is assuming to work without table rewrite, but we're unable to
> do this without table rewrite.

I think that should just rewrite the table in that case.  PRESERVE
should specify the things that are allowed to be preserved -- its mere
presence should not be read to preclude a rewrite.  And it's
completely reasonable for someone to want to do this, if they are
thinking about de-installing awesome.

> I also think that we need some way to change compression method for multiple
> columns in a single table rewrite.  Because it would be way more efficient
> than rewriting table for each of columns.  So as an alternative of
>
> ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome; -- first table
> rewrite
> ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome; -- second table
> rewrite
>
> we could also provide
>
> ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome PRESERVE pglz; -- no
> rewrite
> ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome PRESERVE pglz; -- no
> rewrite
> VACUUM FULL tbl RESET COMPRESSION PRESERVE c1, c2; -- rewrite with
> recompression of c1 and c2 and removing depedencies
>
> ?

Hmm.  ALTER TABLE allows multi comma-separated subcommands, so I don't
think we need to drag VACUUM into this.  The user can just say:

ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome, ALTER COLUMN
c2 SET COMPRESSION awesome;

If this is properly integrated into tablecmds.c, that should cause a
single rewrite affecting both columns.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Inconsistency in plpgsql's error context reports

2017-12-11 Thread Tom Lane

Robert Haas  writes:
> On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane  wrote:
>> There seem to be two ways we could look at this.  One is that the
>> new test case just needs to be rejiggered to avoid unstable output
>> ("\set VERBOSITY terse" would be the easiest way).  But there is
>> also room to argue that it's bad that plpgsql produces error reports
>> that vary depending on the phase of the moon, which is pretty much
>> what this would look like in the field --- cache flushes will occur
>> unpredictably in most application environments.

> I am inclined toward the latter view.

Yeah, me too.  I'll see about patching exec_eval_simple_expr() to
provide a context line that matches SPI's.  Seems like a HEAD-only
change though, as this will result in visible behavior change in
the typical case.

regards, tom lane

Re: Rethinking MemoryContext creation

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 12:36 PM, Tom Lane  wrote:
> [ thinks... ]  If we wanted to go that way, one thing we could do to
> help extension authors (and ourselves) is to define the proposed
> AllocSetContextCreate macro to include
>
> StaticAssertExpr(__builtin_constant_p(name))
>
> on compilers that have __builtin_constant_p.  Now, that only helps
> people using gcc and gcc-alikes, but that's a large fraction of
> developers I should think.  (I tested this and it does seem to
> correctly recognize string literals as constants.)

I like that idea.  I think that would provide good protection not only
for third-party developers but for core developers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Alexander Korotkov

On Mon, Dec 11, 2017 at 8:25 PM, Robert Haas  wrote:

> On Mon, Dec 11, 2017 at 7:55 AM, Ildus Kurbangaliev
>  wrote:
> > On Fri, 8 Dec 2017 15:12:42 -0500
> > Robert Haas  wrote:
> >> Maybe a better idea is ALTER COLUMN SET COMPRESSION x1, x2, x3 ...
> >> meaning that x1 is the default for new tuples but x2, x3, etc. are
> >> still allowed if present.  If you issue a command that only adds
> >> things to the list, no table rewrite happens, but if you remove
> >> anything, then it does.
> >
> > I like this idea, but maybe it should be something like ALTER COLUMN
> > SET COMPRESSION x1 [ PRESERVE x2, x3 ]? 'PRESERVE' is already used in
> > syntax and this syntax will show better which one is current and which
> > ones should be kept.
>
> Sure, that works.  And I think pglz should exist in the catalog as a
> predefined, undroppable compression algorithm.  So the default for
> each column initially is:
>
> SET COMPRESSION pglz
>
> And if you want to rewrite the table with your awesome custom thing, you
> can do
>
> SET COMPRESSION awesome
>
> But if you want to just use the awesome custom thing for new rows, you can
> do
>
> SET COMPRESSION awesome PRESERVE pglz
>
> Then we can get all the dependencies right, pg_upgrade works, users
> have total control of rewrite behavior, and everything is great.  :-)
>

Looks good.

Thus, in your example if user would like to further change awesome
compression for evenbetter compression, she should write.

SET COMPRESSION evenbetter PRESERVE pglz, awesome; -- full list of previous
compression methods

I wonder what should we do if user specifies only part of previous
compression methods?  For instance, pglz is specified but awesome is
missing.

SET COMPRESSION evenbetter PRESERVE pglz; -- awesome is missing

I think we should trigger an error in this case.  Because query is
specified in form that is assuming to work without table rewrite, but we're
unable to do this without table rewrite.

I also think that we need some way to change compression method for
multiple columns in a single table rewrite.  Because it would be way more
efficient than rewriting table for each of columns.  So as an alternative of

ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome; -- first table
rewrite
ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome; -- second table
rewrite

we could also provide

ALTER TABLE tbl ALTER COLUMN c1 SET COMPRESSION awesome PRESERVE pglz; --
no rewrite
ALTER TABLE tbl ALTER COLUMN c2 SET COMPRESSION awesome PRESERVE pglz; --
no rewrite
VACUUM FULL tbl RESET COMPRESSION PRESERVE c1, c2; -- rewrite with
recompression of c1 and c2 and removing depedencies

?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Inconsistency in plpgsql's error context reports

2017-12-11 Thread Robert Haas

On Sun, Dec 10, 2017 at 11:55 AM, Tom Lane  wrote:
> There seem to be two ways we could look at this.  One is that the
> new test case just needs to be rejiggered to avoid unstable output
> ("\set VERBOSITY terse" would be the easiest way).  But there is
> also room to argue that it's bad that plpgsql produces error reports
> that vary depending on the phase of the moon, which is pretty much
> what this would look like in the field --- cache flushes will occur
> unpredictably in most application environments.

I am inclined toward the latter view.  I think predictability is very
important and that lack of predictability is both frustrating for
users and a sign of poor programming practices under the hood.  Nobody
likes clicking on the same thing twice and getting a different result
each time; as humans, we want to feel like we understand how things
work, and if they work differently at different times, we tend to
either question whether we understand them or question whether they
are actually robust systems.  For example, lately my car has started
to take a variable number of gallons of gasoline to fill the tank from
almost-empty to completely full.  I find that rather unnerving: how do
I know whether I'm going to run out of gas?  Similarly here -- I'm not
sure whether people want that extra CONTEXT line or not, but I think
they're going to want to either get it or not get it consistently.  If
not, they'll tend to feel like something is flaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking MemoryContext creation

2017-12-11 Thread Dagfinn Ilmari Mannsåker

Robert Haas  writes:

> On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
>> Tomas Vondra  writes:
>>> On 12/11/2017 05:27 PM, Tom Lane wrote:
 However, unless we want to run around and touch all the ~ 150 calls
 with constant arguments, we'd have to set things up so that the default
 behavior for AllocSetContextCreate is to not copy.  This risks breaking
 callers in extensions.  Not entirely sure if it's worth that --- any
 thoughts?
>>
>>> I don't think silently breaking extensions is particularly attractive
>>> option, so I guess we'll have to run around and tweak the ~150 calls.
>>
>> Meh.  I suppose that of the ~150 call sites, there are probably only
>> a dozen or two where it would actually make a performance difference,
>> so maybe this needn't be quite as invasive as I first thought.
>
> I think changing only a subset of the call sites is unappealing
> because, even though it may not make a measurable performance
> difference in other cases, it may get cargo-culted into some place
> where it does make a difference.

Would it be acceptable to only get this optimisation on compilers that
support __builtin_constant_p or similar?  If so, the wrapper macro could
use that to automatically pass the no-copy flag when called with a
literal string.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

Simon Riggs  writes:
> On 11 December 2017 at 16:27, Tom Lane  wrote:
>> For a *very* large majority of the callers of AllocSetContextCreate,
>> the context name is a simple C string constant, so we could just store
>> the pointer to it and save the space and cycles required to copy it.

> Why have the string at all in that case?

Try reading a MemoryContextStats dump without it ...

regards, tom lane

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tomas Vondra

On 12/11/2017 06:22 PM, Robert Haas wrote:
> On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
>> Tomas Vondra  writes:
>>> On 12/11/2017 05:27 PM, Tom Lane wrote:
 However, unless we want to run around and touch all the ~ 150 calls
 with constant arguments, we'd have to set things up so that the default
 behavior for AllocSetContextCreate is to not copy.  This risks breaking
 callers in extensions.  Not entirely sure if it's worth that --- any
 thoughts?
>>
>>> I don't think silently breaking extensions is particularly attractive
>>> option, so I guess we'll have to run around and tweak the ~150 calls.
>>
>> Meh.  I suppose that of the ~150 call sites, there are probably only
>> a dozen or two where it would actually make a performance difference,
>> so maybe this needn't be quite as invasive as I first thought.
> 
> I think changing only a subset of the call sites is unappealing
> because, even though it may not make a measurable performance
> difference in other cases, it may get cargo-culted into some place
> where it does make a difference.
> 

Not sure. One the one hand, it might certainly be somewhat confusing
when we use two different methods to create memory contexts with C
string constants. On the other hand, I'm sure we have other similar
"local" optimizations.

I'd say "let's just tweak all the calls to use the new function" but I'm
not the person doing the leg work ...

> I also don't think silently breaking extensions is a terribly big deal
> in this case.  It seems likely that most extensions use static names
> just as most of our internal stuff does.  I'm going to guess that the
> number of extensions that will actually break as a result of a change
> in this area is probably very small - conceivably zero, and likely
> less than five.  I don't think we should be willing to uglify the core
> code too much for that level of breakage.
> 

I don't know how many extensions you maintain, but in my experience this
type of silent breakage is extremely annoying. I definitely prefer a
clean compile-time API breakage, for example.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

Robert Haas  writes:
> On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
>> Tomas Vondra  writes:
>>> On 12/11/2017 05:27 PM, Tom Lane wrote:
 However, unless we want to run around and touch all the ~ 150 calls
 with constant arguments, we'd have to set things up so that the default
 behavior for AllocSetContextCreate is to not copy.  This risks breaking
 callers in extensions.  Not entirely sure if it's worth that --- any
 thoughts?

>>> I don't think silently breaking extensions is particularly attractive
>>> option, so I guess we'll have to run around and tweak the ~150 calls.

> I also don't think silently breaking extensions is a terribly big deal
> in this case.  It seems likely that most extensions use static names
> just as most of our internal stuff does.  I'm going to guess that the
> number of extensions that will actually break as a result of a change
> in this area is probably very small - conceivably zero, and likely
> less than five.  I don't think we should be willing to uglify the core
> code too much for that level of breakage.

[ thinks... ]  If we wanted to go that way, one thing we could do to
help extension authors (and ourselves) is to define the proposed
AllocSetContextCreate macro to include

StaticAssertExpr(__builtin_constant_p(name))

on compilers that have __builtin_constant_p.  Now, that only helps
people using gcc and gcc-alikes, but that's a large fraction of
developers I should think.  (I tested this and it does seem to
correctly recognize string literals as constants.)

regards, tom lane

Re: Rethinking MemoryContext creation

2017-12-11 Thread Simon Riggs

On 11 December 2017 at 16:27, Tom Lane  wrote:

> For a *very* large majority of the callers of AllocSetContextCreate,
> the context name is a simple C string constant, so we could just store
> the pointer to it and save the space and cycles required to copy it.

Why have the string at all in that case?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 7:55 AM, Ildus Kurbangaliev
 wrote:
> On Fri, 8 Dec 2017 15:12:42 -0500
> Robert Haas  wrote:
>> Maybe a better idea is ALTER COLUMN SET COMPRESSION x1, x2, x3 ...
>> meaning that x1 is the default for new tuples but x2, x3, etc. are
>> still allowed if present.  If you issue a command that only adds
>> things to the list, no table rewrite happens, but if you remove
>> anything, then it does.
>
> I like this idea, but maybe it should be something like ALTER COLUMN
> SET COMPRESSION x1 [ PRESERVE x2, x3 ]? 'PRESERVE' is already used in
> syntax and this syntax will show better which one is current and which
> ones should be kept.

Sure, that works.  And I think pglz should exist in the catalog as a
predefined, undroppable compression algorithm.  So the default for
each column initially is:

SET COMPRESSION pglz

And if you want to rewrite the table with your awesome custom thing, you can do

SET COMPRESSION awesome

But if you want to just use the awesome custom thing for new rows, you can do

SET COMPRESSION awesome PRESERVE pglz

Then we can get all the dependencies right, pg_upgrade works, users
have total control of rewrite behavior, and everything is great.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking MemoryContext creation

2017-12-11 Thread Robert Haas

On Mon, Dec 11, 2017 at 11:59 AM, Tom Lane  wrote:
> Tomas Vondra  writes:
>> On 12/11/2017 05:27 PM, Tom Lane wrote:
>>> However, unless we want to run around and touch all the ~ 150 calls
>>> with constant arguments, we'd have to set things up so that the default
>>> behavior for AllocSetContextCreate is to not copy.  This risks breaking
>>> callers in extensions.  Not entirely sure if it's worth that --- any
>>> thoughts?
>
>> I don't think silently breaking extensions is particularly attractive
>> option, so I guess we'll have to run around and tweak the ~150 calls.
>
> Meh.  I suppose that of the ~150 call sites, there are probably only
> a dozen or two where it would actually make a performance difference,
> so maybe this needn't be quite as invasive as I first thought.

I think changing only a subset of the call sites is unappealing
because, even though it may not make a measurable performance
difference in other cases, it may get cargo-culted into some place
where it does make a difference.

I also don't think silently breaking extensions is a terribly big deal
in this case.  It seems likely that most extensions use static names
just as most of our internal stuff does.  I'm going to guess that the
number of extensions that will actually break as a result of a change
in this area is probably very small - conceivably zero, and likely
less than five.  I don't think we should be willing to uglify the core
code too much for that level of breakage.

But those are just my opinions.  I am glad you are working on this; I
noticed this problem before and thought of trying to do something
about it, but ran out of time and ideas.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

Tomas Vondra  writes:
> On 12/11/2017 05:27 PM, Tom Lane wrote:
>> However, unless we want to run around and touch all the ~ 150 calls
>> with constant arguments, we'd have to set things up so that the default
>> behavior for AllocSetContextCreate is to not copy.  This risks breaking
>> callers in extensions.  Not entirely sure if it's worth that --- any
>> thoughts?

> I don't think silently breaking extensions is particularly attractive
> option, so I guess we'll have to run around and tweak the ~150 calls.

Meh.  I suppose that of the ~150 call sites, there are probably only
a dozen or two where it would actually make a performance difference,
so maybe this needn't be quite as invasive as I first thought.

regards, tom lane

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tomas Vondra



On 12/11/2017 05:27 PM, Tom Lane wrote:
> I wrote:
>> While fooling around with a different problem, I got annoyed at how slow
>> MemoryContext creation and deletion is.
> 
> Looking at this some more, there's another micro-optimization we could
> make, which is to try to get rid of the strlen+strcpy operations needed
> for the context name.  (And yes, I'm seeing those show up in profiles,
> to the tune of a couple percent of total runtime in some examples.)
> 
> For a *very* large majority of the callers of AllocSetContextCreate,
> the context name is a simple C string constant, so we could just store
> the pointer to it and save the space and cycles required to copy it.
> This would require providing a separate API for the few callers that are
> actually passing transient strings, but that's easy enough.  I envision
> AllocSetContextCreate becoming a wrapper macro for
> "AllocSetContextCreateExtended", which'd take a flag argument specifying
> whether the context name needs to be copied.
> 
> However, unless we want to run around and touch all the ~ 150 calls
> with constant arguments, we'd have to set things up so that the default
> behavior for AllocSetContextCreate is to not copy.  This risks breaking
> callers in extensions.  Not entirely sure if it's worth that --- any
> thoughts?
> 

I don't think silently breaking extensions is particularly attractive
option, so I guess we'll have to run around and tweak the ~150 calls.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread David Fetter

On Mon, Dec 11, 2017 at 11:14:29AM -0200, Fabrízio de Royes Mello wrote:
> And what about create some way to create aliases for meta-commands in psql?

At first blush, this sounds interesting, but on further reflection, it
seems more like a MASSIVE foot gun on the useability/getting help
side, and likely has non-trivial negative implications for access
control and security.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Rethinking MemoryContext creation

2017-12-11 Thread Tom Lane

I wrote:
> While fooling around with a different problem, I got annoyed at how slow
> MemoryContext creation and deletion is.

Looking at this some more, there's another micro-optimization we could
make, which is to try to get rid of the strlen+strcpy operations needed
for the context name.  (And yes, I'm seeing those show up in profiles,
to the tune of a couple percent of total runtime in some examples.)

For a *very* large majority of the callers of AllocSetContextCreate,
the context name is a simple C string constant, so we could just store
the pointer to it and save the space and cycles required to copy it.
This would require providing a separate API for the few callers that are
actually passing transient strings, but that's easy enough.  I envision
AllocSetContextCreate becoming a wrapper macro for
"AllocSetContextCreateExtended", which'd take a flag argument specifying
whether the context name needs to be copied.

However, unless we want to run around and touch all the ~ 150 calls
with constant arguments, we'd have to set things up so that the default
behavior for AllocSetContextCreate is to not copy.  This risks breaking
callers in extensions.  Not entirely sure if it's worth that --- any
thoughts?

regards, tom lane

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Everaldo Canuto

Hello.

On Mon, Dec 11, 2017 at 11:14 AM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:

>
> And what about create some way to create aliases for meta-commands in
> psql? Something like:
>
> \alias quit \q
> \alias exit \q
>
> We also can create a special config variable to enable/disable use of
> aliases:
>
> \set USE_ALIAS on
>

I don't think it will fix the problem, if you don't know how to use "\q",
"\alias exit \q" and "exit" will be even harder.
Remember, it is basically to help newcomers.

Re: proposal: alternative psql commands quit and exit

2017-12-11 Thread Fabrízio de Royes Mello

On Sat, Dec 9, 2017 at 12:59 PM, David Fetter  wrote:
>
> On Fri, Dec 08, 2017 at 02:12:06PM -0500, Robert Haas wrote:
> > On Fri, Dec 8, 2017 at 10:28 AM, Szymon Lipiński 
wrote:
> > > Well, if I have a new program I usually read some documentation. It
really
> > > helps people to exit vim as well :)
> >
> > Look, I love vim and use it constantly, but no reasonable person is
> > going to hold it up as a good example of user-friendly software.
> > According to Wikipedia, it was written in 1976 by Bill Joy, and
> > computers have come a long way in the last 40 years.  They are,
> > broadly, easier to use now.  For example, current versions of vim let
> > you move around using new-fangled arrow keys, as if computers were
> > supposed to cater to the lowest common denominator!  Real men (like
> > me) use hjkl to get around the screen, and look upon those who resort
> > to the arrow keys as Johnny-come-latelys.  Nevertheless, I can hardly
> > fault vim/vi's concession to modernity in this regard.
> >
> > > Thinking this way for me psql's way is the intuitive one, because I
know it.
> > > Should I kindly ask Oracle to change their programs because I rather
want to
> > > use their software than reading their documentation?
> >
> > If you can convince Oracle to add support for \q to sqlplus, I say go
> > for it.  Actually, what I'd like even better is if you could convince
> > them to add curses support, but I guess they would have done that 30
> > years ago if they were inclined to do it.
> >
> > Really good software -- which sqlplus is not -- doesn't make it
> > necessary to read the documentation.  It helps you figure out how to
> > use it as you go.  When I fire up a new app on my iPhone, it generally
> > gives me a clue how to use it.  Sure, there's probably an app in the
> > Apple Store someplace that is absolutely unusable without reading the
> > documentation, but that's a bug, not a feature.   It's expected that
> > you'll have to read the documentation to figure out how to use the
> > advanced features of any complicated program, but that doesn't mean we
> > should make simple things complicated just to scare off users that
> > aren't sufficiently committed to the Cause.
>
> +1
>

And what about create some way to create aliases for meta-commands in psql?
Something like:

\alias quit \q
\alias exit \q

We also can create a special config variable to enable/disable use of
aliases:

\set USE_ALIAS on


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: [HACKERS] Custom compression methods

2017-12-11 Thread Ildus Kurbangaliev

On Fri, 8 Dec 2017 15:12:42 -0500
Robert Haas  wrote:

> 
> Maybe a better idea is ALTER COLUMN SET COMPRESSION x1, x2, x3 ...
> meaning that x1 is the default for new tuples but x2, x3, etc. are
> still allowed if present.  If you issue a command that only adds
> things to the list, no table rewrite happens, but if you remove
> anything, then it does.
> 

I like this idea, but maybe it should be something like ALTER COLUMN
SET COMPRESSION x1 [ PRESERVE x2, x3 ]? 'PRESERVE' is already used in
syntax and this syntax will show better which one is current and which
ones should be kept.

-- 

Regards,
Ildus Kurbangaliev

Re: Mention ordered datums in PartitionBoundInfoData comment

2017-12-11 Thread Ashutosh Bapat

On Fri, Dec 8, 2017 at 12:25 AM, Robert Haas  wrote:
> On Tue, Dec 5, 2017 at 1:03 AM, Ashutosh Bapat
>  wrote:
>> Sorry. Thanks for pointing it out. fixed in the attached patch.
>
> + * The datums in datums array are arranged in the increasing order defined by
>
> Suggest: in increasing order as defined
>

Done.

> There's a second place where the same change is needed.

Done.

>
> + * resp. For range and list partitions this simply means that the datums in 
> the
>
> I think you should spell out "respectively" instead of abbreviating to "resp".

Done.

>
> + * datums array are arranged in the increasing order defined by the partition
> + * key collation.
>
> It's not just the collation but also, and I think more importantly,
> the operator class.   And there can be multiple columns, and thus
> multiple opclases/collations.  Maybe "defined by the partition key's
> operator classes and collations".

I had forgot about the operator class. Sorry. Done.

>
> + * PartitionBoundInfoData structures for two partitioned tables with exactly
> + * same bounds look exactly same.
>
> This doesn't seem to me to add much.
>

We have a comment in partition_bounds_equal()'s prologue

"PartitionBoundInfo is a canonical representation of partition bounds.".

But we may use that property in other places also, so having it in
prologue of PartitionBoundsInfoData makes sense. For now, I have
removed those lines.

PFA updated patch.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dd4a8d3..ef156e4 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -72,6 +72,13 @@
  * of datum-tuples with 2 datums, modulus and remainder, corresponding to a
  * given partition.
  *
+ * The datums in datums array are arranged in increasing order as defined by
+ * functions qsort_partition_rbound_cmp(), qsort_partition_list_value_cmp() and
+ * qsort_partition_hbound_cmp() for range, list and hash partitioned tables
+ * respectively. For range and list partitions this simply means that the
+ * datums in the datums array are arranged in increasing order as defined by
+ * the partition key's operator classes and collations.
+ *
  * In the case of list partitioning, the indexes array stores one entry for
  * every datum, which is the index of the partition that accepts a given datum.
  * In case of range partitioning, it stores one entry per distinct range

Re: [HACKERS] More stats about skipped vacuums

2017-12-11 Thread Kyotaro HORIGUCHI

At Mon, 27 Nov 2017 13:51:22 -0500, Robert Haas  wrote 
in 
> On Mon, Nov 27, 2017 at 1:49 AM, Kyotaro HORIGUCHI
>  wrote:
> > Hmmm. Okay, we must make stats collector more effeicient if we
> > want to have additional counters with smaller significance in the
> > table stats. Currently sizeof(PgStat_StatTabEntry) is 168
> > bytes. The whole of the patchset increases it to 232 bytes. Thus
> > the size of a stat file for a database with 1 tables
> > increases from about 1.7MB to 2.4MB.  DSM and shared dynahash is
> > not dynamically expandable so placing stats on shared hash
> > doesn't seem effective. Stats as a regular table could work but
> > it seems too-much.
> 
> dshash, which is already committed, is both DSM-based and dynamically
> expandable.

Yes, I forgot about that. We can just copy memory blocks to take
a snapshot of stats.

> > Is it acceptable that adding a new section containing this new
> > counters, which is just loaded as a byte sequence and parsing
> > (and filling the corresponding hash) is postponed until a counter
> > in the section is really requested?  The new counters need to be
> > shown in a separate stats view (maybe named pg_stat_vacuum).
> 
> Still makes the stats file bigger.

I considered dshash for pgstat.c and the attached is a *PoC*
patch, which is not full-fledged and just working under a not so
concurrent situation.

- Made stats collector an auxiliary process. A crash of stats
  collector leads to a whole-server restarting.

- dshash lacks capability of sequential scan so added it.

- Also added snapshot function to dshash. It just copies
  unerlying DSA segments into local memory but currently it
  doesn't aquire dshash-level locks at all. I tried the same
  thing with resize but it leads to very quick exhaustion of
  LWLocks. An LWLock for the whole dshash would be required. (and
  it is also useful to resize() and sequential scan.

- The current dshash doesn't shrink at all. Such a feature also
  will be required. (A server restart causes a shrink of hashes
  in the same way as before but bloat dshash requires copying
  more than necessary size of memory on takeing a snapshot.)

The size of DSA is about 1MB at minimum. Copying entry-by-entry
into (non-ds) hash might be better than copying underlying DSA as
a whole, and DSA/DSHASH snapshot brings a kind of dirty..


Does anyone give me opinions or suggestions?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From f5c9c45384ec43734c0890dd875101defe6590bc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 1 Dec 2017 14:34:47 +0900
Subject: [PATCH 1/4] Simple implement of local shapshot of dshash.

Add snapshot feature to DSHASH. This makes palloc'ed copy of
underlying DSA and returns unmodifiable DSHASH using the copied DSA.
---
 src/backend/lib/dshash.c | 74 +++-
 src/backend/utils/mmgr/dsa.c | 57 +-
 src/include/lib/dshash.h |  1 +
 src/include/utils/dsa.h  |  1 +
 4 files changed, 124 insertions(+), 9 deletions(-)

diff --git a/src/backend/lib/dshash.c b/src/backend/lib/dshash.c
index dd87573..973a826 100644
--- a/src/backend/lib/dshash.c
+++ b/src/backend/lib/dshash.c
@@ -112,6 +112,7 @@ struct dshash_table
 	size_t		size_log2;		/* log2(number of buckets) */
 	bool		find_locked;	/* Is any partition lock held by 'find'? */
 	bool		find_exclusively_locked;	/* ... exclusively? */
+	bool		is_snapshot;	/* Is this hash is a local snapshot?*/
 };
 
 /* Given a pointer to an item, find the entry (user data) it holds. */
@@ -228,6 +229,7 @@ dshash_create(dsa_area *area, const dshash_parameters *params, void *arg)
 
 	hash_table->find_locked = false;
 	hash_table->find_exclusively_locked = false;
+	hash_table->is_snapshot = false;
 
 	/*
 	 * Set up the initial array of buckets.  Our initial size is the same as
@@ -279,6 +281,7 @@ dshash_attach(dsa_area *area, const dshash_parameters *params,
 	hash_table->control = dsa_get_address(area, control);
 	hash_table->find_locked = false;
 	hash_table->find_exclusively_locked = false;
+	hash_table->is_snapshot = false;
 	Assert(hash_table->control->magic == DSHASH_MAGIC);
 
 	/*
@@ -321,6 +324,15 @@ dshash_destroy(dshash_table *hash_table)
 	size_t		i;
 
 	Assert(hash_table->control->magic == DSHASH_MAGIC);
+
+	/* this is a local copy */
+	if (hash_table->is_snapshot)
+	{
+		pfree(hash_table->area);
+		pfree(hash_table);
+		return;
+	}
+
 	ensure_valid_bucket_pointers(hash_table);
 
 	/* Free all the entries. */
@@ -355,6 +367,29 @@ dshash_destroy(dshash_table *hash_table)
 }
 
 /*
+ * take a local snapshot of a dshash table
+ */
+dshash_table *
+dshash_take_snapshot(dshash_table *org_table, dsa_area *new_area)
+{
+	dshash_table *new_table;
+
+	if (org_table->is_snapshot)
+		elog(ERROR, "cannot make local copy of local dshash");
+
+	new_table = palloc(sizeof(dshash_table));
+
+	new_table->area = new_area;
+	new_table->params = org_table->params;
+	new_table->control = d

Re: [HACKERS] asynchronous execution

2017-12-11 Thread Kyotaro HORIGUCHI

Hello,

At Fri, 20 Oct 2017 17:37:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20171020.173707.12913619.horiguchi.kyot...@lab.ntt.co.jp>
> The attached PoC patch theoretically has no impact on the normal
> code paths and just brings gain in async cases.

The parallel append just committed hit this and the attached are
the rebased version to the current HEAD. The result of a concise
performance test follows.

   patched(ms)  unpatched(ms)   gain(%)
A: simple table scan :  3562.32  3444.81-3.4
B: local partitioning:  1451.25  1604.38 9.5
C: single remote table   :  8818.92  9297.76 5.1
D: sharding (single con) :  5966.14  6646.7310.2
E: sharding (multi con)  :  1802.25  6515.4972.3

> A and B are degradation checks, which are expected to show no
> degradation.  C is the gain only by postgres_fdw's command
> presending on a remote table.  D is the gain of sharding on a
> connection. The number of partitions/shards is 4.  E is the gain
> using dedicate connection per shard.

Test A is accelerated by parallel sequential scan. Introducing
parallel append accelerates test B. Comparing A and B, I doubt
that degradation is stably measurable at least my environment but
I believe that there's no degradation theoreticaly. The test C to
E still shows apparent gain.
regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b1aff3362b983975003d8a60f9b3593cb2fa62fc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Mon, 22 May 2017 12:42:58 +0900
Subject: [PATCH 1/3] Allow wait event set to be registered to resource owner

WaitEventSet needs to be released using resource owner for a certain
case. This change adds WaitEventSet reowner and allow the creator of a
WaitEventSet to specify a resource owner.
---
 src/backend/libpq/pqcomm.c|  2 +-
 src/backend/storage/ipc/latch.c   | 18 ++-
 src/backend/storage/lmgr/condition_variable.c |  2 +-
 src/backend/utils/resowner/resowner.c | 68 +++
 src/include/storage/latch.h   |  4 +-
 src/include/utils/resowner_private.h  |  8 
 6 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index fc15181..7c4077a 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -220,7 +220,7 @@ pq_init(void)
 (errmsg("could not set socket to nonblocking mode: %m")));
 #endif
 
-	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, 3);
+	FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
 	AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, MyProcPort->sock,
 	  NULL, NULL);
 	AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index 4eb6e83..e6fc3dd 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -51,6 +51,7 @@
 #include "storage/latch.h"
 #include "storage/pmsignal.h"
 #include "storage/shmem.h"
+#include "utils/resowner_private.h"
 
 /*
  * Select the fd readiness primitive to use. Normally the "most modern"
@@ -77,6 +78,8 @@ struct WaitEventSet
 	int			nevents;		/* number of registered events */
 	int			nevents_space;	/* maximum number of events in this set */
 
+	ResourceOwner	resowner;	/* Resource owner */
+
 	/*
 	 * Array, of nevents_space length, storing the definition of events this
 	 * set is waiting for.
@@ -359,7 +362,7 @@ WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock,
 	int			ret = 0;
 	int			rc;
 	WaitEvent	event;
-	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, 3);
+	WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL, 3);
 
 	if (wakeEvents & WL_TIMEOUT)
 		Assert(timeout >= 0);
@@ -517,12 +520,15 @@ ResetLatch(volatile Latch *latch)
  * WaitEventSetWait().
  */
 WaitEventSet *
-CreateWaitEventSet(MemoryContext context, int nevents)
+CreateWaitEventSet(MemoryContext context, ResourceOwner res, int nevents)
 {
 	WaitEventSet *set;
 	char	   *data;
 	Size		sz = 0;
 
+	if (res)
+		ResourceOwnerEnlargeWESs(res);
+
 	/*
 	 * Use MAXALIGN size/alignment to guarantee that later uses of memory are
 	 * aligned correctly. E.g. epoll_event might need 8 byte alignment on some
@@ -591,6 +597,11 @@ CreateWaitEventSet(MemoryContext context, int nevents)
 	StaticAssertStmt(WSA_INVALID_EVENT == NULL, "");
 #endif
 
+	/* Register this wait event set if requested */
+	set->resowner = res;
+	if (res)
+		ResourceOwnerRememberWES(set->resowner, set);
+
 	return set;
 }
 
@@ -632,6 +643,9 @@ FreeWaitEventSet(WaitEventSet *set)
 	}
 #endif
 
+	if (set->resowner != NULL)
+		ResourceOwnerForgetWES(set->resowner, set);
+
 	pfree(set);
 }
 
diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index b4b7d28..182f759 100644
--- a/src/backend/storage/lmgr/conditio

Re: After dropping the rule - Not able to insert / server crash (one time ONLY)

2017-12-11 Thread Dilip Kumar

On Mon, Dec 11, 2017 at 3:54 PM, tushar 
wrote:

> Hi,
>
> While testing something , I found that even after rule has dropped  not
> able to insert data  and in an another scenario , there is a Crash/
>
> Please refer this scenario's -
>
> 1) Rows not inserted after dropping the RULE
>
> postgres=# create table e(n int);
> CREATE TABLE
> postgres=# create rule e1 as on insert to e do instead nothing;
> CREATE RULE
> postgres=# create function e11(n int) returns int as $$ begin insert into
> e values(10); return 1; end; $$ language 'plpgsql';
> CREATE FUNCTION
> postgres=# select e11(1);
>  e11
> -
>1
> (1 row)
>
> postgres=# select e11(1);
>  e11
> -
>1
> (1 row)
>
> postgres=# select * from e;  => Expected , due to  the RULE enforced
>  n
> ---
> (0 rows)
>
>
> postgres=# drop rule e1 on e;  ==>Drop the rule
> DROP RULE
>
> postgres=# select e11(1);  =>again try to insert into table
>  e11
> -
>1
> (1 row)
>
> postgres=# select * from e;  =>This time , value should be inserted but
> still showing 0 row.
>  n
> ---
> (0 rows)
>
>
>
> 2)Server crash (one time)
>
> postgres=# create table en(n int);
> CREATE TABLE
> postgres=# create function en1(n int) returns int as $$ begin insert into
> en values(10); return 1; end; $$ language 'plpgsql';
> CREATE FUNCTION
> postgres=#
> postgres=# select en1(1);
>  en1
> -
>1
> (1 row)
>
> postgres=# select * from en;
>  n
> 
>  10
> (1 row)
>
> postgres=# create rule en11 as on insert to en do instead nothing;
> CREATE RULE
> postgres=# select en1(1);
> ostgres=# select en1(1);
> TRAP: FailedAssertion("!(!stmt->mod_stmt)", File: "pl_exec.c", Line: 3721)
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
>

I have looked into this crash, seems like below assert in exec_stmt_execsql
is not correct

*case* SPI_OK_REWRITTEN:
Assert(!stmt->mod_stmt);

In this issue first time execution of select en1(1); statement, stmt->mod_stmt
is set to true for the insert query inside the function. Then before next
execution we have created the rule "create rule en11 as on insert to en do
instead nothing;".  Because of this nothing will be executed and output
will be SPI_OK_REWRITTEN.  But we are asserting that stmt->mod_stmt should
be false (which were set to true in first execution).

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

After dropping the rule - Not able to insert / server crash (one time ONLY)

2017-12-11 Thread tushar


Hi,

While testing something , I found that even after rule has dropped  not 
able to insert data  and in an another scenario , there is a Crash/


Please refer this scenario's -

1) Rows not inserted after dropping the RULE

postgres=# create table e(n int);
CREATE TABLE
postgres=# create rule e1 as on insert to e do instead nothing;
CREATE RULE
postgres=# create function e11(n int) returns int as $$ begin insert 
into e values(10); return 1; end; $$ language 'plpgsql';

CREATE FUNCTION
postgres=# select e11(1);
 e11
-
   1
(1 row)

postgres=# select e11(1);
 e11
-
   1
(1 row)

postgres=# select * from e;  => Expected , due to  the RULE enforced
 n
---
(0 rows)


postgres=# drop rule e1 on e;  ==>Drop the rule
DROP RULE

postgres=# select e11(1);  =>again try to insert into table
 e11
-
   1
(1 row)

postgres=# select * from e;  =>This time , value should be inserted but 
still showing 0 row.

 n
---
(0 rows)



2)Server crash (one time)

postgres=# create table en(n int);
CREATE TABLE
postgres=# create function en1(n int) returns int as $$ begin insert 
into en values(10); return 1; end; $$ language 'plpgsql';

CREATE FUNCTION
postgres=#
postgres=# select en1(1);
 en1
-
   1
(1 row)

postgres=# select * from en;
 n

 10
(1 row)

postgres=# create rule en11 as on insert to en do instead nothing;
CREATE RULE
postgres=# select en1(1);
ostgres=# select en1(1);
TRAP: FailedAssertion("!(!stmt->mod_stmt)", File: "pl_exec.c", Line: 3721)
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: 2017-12-11 
10:05:57.847 GMT [18493] LOG:  server process (PID 18604) was terminated 
by signal 6: Aborted
2017-12-11 10:05:57.847 GMT [18493] DETAIL:  Failed process was running: 
select en1(1);
2017-12-11 10:05:57.847 GMT [18493] LOG:  terminating any other active 
server processes
2017-12-11 10:05:57.847 GMT [18498] WARNING:  terminating connection 
because of crash of another server process
2017-12-11 10:05:57.847 GMT [18498] DETAIL:  The postmaster has 
commanded this server process to roll back the current transaction and 
exit, because another server process exited abnormally and possibly 
corrupted shared memory.
2017-12-11 10:05:57.847 GMT [18498] HINT:  In a moment you should be 
able to reconnect to the database and repeat your command.

Failed.
!> 2017-12-11 10:05:57.849 GMT [18493] LOG:  all server processes 
terminated; reinitializing



again try to connect and fire this same query -

postgres=# select en1(1);  =>This time no crash but again rows not inserted
 en1
-
   1
(1 row)

This is not a regression, i am able to see such behavior in v9.6 as well.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2017-12-11 Thread Masahiko Sawada

On Tue, Nov 28, 2017 at 12:31 PM, Ashutosh Bapat
 wrote:
> On Tue, Nov 28, 2017 at 3:04 AM, Masahiko Sawada  
> wrote:
>> On Fri, Nov 24, 2017 at 10:28 PM, Antonin Houska  wrote:
>>> Masahiko Sawada  wrote:
>>>
 On Mon, Oct 30, 2017 at 5:48 PM, Ashutosh Bapat
  wrote:
 > On Thu, Oct 26, 2017 at 7:41 PM, Masahiko Sawada  
 > wrote:
 >>
 >> Because I don't want to break the current user semantics. that is,
 >> currently it's guaranteed that the subsequent reads can see the
 >> committed result of previous writes even if the previous transactions
 >> were distributed transactions. And it's ensured by writer side. If we
 >> can make the reader side ensure it, the backend process don't need to
 >> wait for the resolver process.
 >>
 >> The waiting backend process are released by resolver process after the
 >> resolver process tried to resolve foreign transactions. Even if
 >> resolver process failed to either connect to foreign server or to
 >> resolve foreign transaction the backend process will be released and
 >> the foreign transactions are leaved as dangling transaction in that
 >> case, which are processed later. Also if resolver process takes a long
 >> time to resolve foreign transactions for whatever reason the user can
 >> cancel it by Ctl-c anytime.
 >>
 >
 > So, there's no guarantee that the next command issued from the
 > connection *will* see the committed data, since the foreign
 > transaction might not have committed because of a network glitch
 > (say). If we go this route of making backends wait for resolver to
 > resolve the foreign transaction, we will have add complexity to make
 > sure that the waiting backends are woken up in problematic events like
 > crash of the resolver process OR if the resolver process hangs in a
 > connection to a foreign server etc. I am not sure that the complexity
 > is worth the half-guarantee.
 >

 Hmm, maybe I was wrong. I now think that the waiting backends can be
 woken up only in following cases;
 - The resolver process succeeded to resolve all foreign transactions.
 - The user did the cancel (e.g. ctl-c).
 - The resolver process failed to resolve foreign transaction for a
 reason of there is no such prepared transaction on foreign server.

 In other cases the resolver process should not release the waiters.
>>>
>>> I'm not sure I see consensus here. What Ashutosh says seems to be: "Special
>>> effort is needed to ensure that backend does not keep waiting if the 
>>> resolver
>>> can't finish it's work in forseable future. But this effort is not worth
>>> because by waking the backend up you might prevent the next transaction from
>>> seeing the changes the previous one tried to make."
>>>
>>> On the other hand, your last comments indicate that you try to be even more
>>> stringent in letting the backend wait. However even this stringent approach
>>> does not guarantee that the next transaction will see the data changes made 
>>> by
>>> the previous one.
>>>
>>
>> What I'd like to guarantee is that the subsequent read can see the
>> committed result of previous writes if the transaction involving
>> multiple foreign servers is committed without cancellation by user. In
>> other words, the backend should not be waken up and the resolver
>> should continue to resolve at certain intervals even if the resolver
>> fails to connect to the foreign server or fails to resolve it. This is
>> similar to what synchronous replication guaranteed today. Keeping this
>> semantics is very important for users. Note that the reading a
>> consistent result by concurrent reads is a separated problem.
>
> The question I have is how would we deal with a foreign server that is
> not available for longer duration due to crash, longer network outage
> etc. Example is the foreign server crashed/got disconnected after
> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
> blocked for much longer duration without user having an idea of what's
> going on. May be we should add some timeout.

After more thought, I agree with adding some timeout. I can image
there are users who want the timeout, for example, who cannot accept
even a few seconds latency. If the timeout occurs backend unlocks the
foreign transactions and breaks the loop. The resolver process will
keep to continue to resolve foreign transactions at certain interval.

>>
>> The read result including foreign servers can be inconsistent if the
>> such transaction is cancelled or the coordinator server crashes during
>> two-phase commit processing. That is, if there is in-doubt transaction
>> the read result can be inconsistent, even for subsequent reads. But I
>> think this behaviour can be accepted by users. For the resolution of
>> in-doubt transactions, the resolver process will try to resolve such
>> transactions after the coordinator server r

Re: Out of date comment in cached_plan_cost

2017-12-11 Thread David Rowley

On 11 December 2017 at 21:39, Ashutosh Bapat
 wrote:
> I don't see much difference in the old and new wording. The word
> "generally" confuses more than clarifying the cases when the planning
> cost curves do not change with the number of relations i.e.
> partitions.

I added that to remove the false claim that inheritance children don't
make the join problem more complex. This was only true before we had
partition-wise joins.

I've re-read my original patch and I don't really see the problem with
it. The comment is talking about inheritance child relations, which
you could either interpret to mean INHERITS (sometable), or some table
listed in pg_inherits. The statement that I added forces me into
thinking of the former rather than the latter, so I don't really see
any issue.

I'm going to leave it here. I don't want to spend too much effort
rewording a comment. My vote is for the original patch I sent. I only
changed it because Robert complained that technically an inheritance
child could actually be a partition.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread Ashutosh Bapat

On Mon, Dec 11, 2017 at 3:16 PM, David Rowley
 wrote:
>
> Sometime in the future, I'd like to see some sort of UniqueKeys List
> in RelOptInfo which is initially populated by using looking at the
> unique indexes during build_simple_rel(). The idea is that these
> UniqueKeys would get propagated into RELOPT_JOINRELs when the join
> condition matches a set of unique keys on the other side of the join.
> e.g. If the outer side of the join had a UniqueKey matching the join
> condition then we could take all of the UniqueKeys from the RelOptInfo
> for the inner side of the join (and vice versa). This infrastructure
> would allow us to no-op a DISTINCT when the RelOptInfo has targetlist
> items which completely cover at least one UniqueKey set, or not bother
> performing a sort for a GROUP BY when the GROUP BY clause is covered
> by a UniqueKey.
>

That looks neat.

> To fix the case we're discussing, we can also propagate those
> UniqueKeys to an AppendPath with a single subpath similar to how I'm
> propagating the PathKeys for the same case in this patch, that way the
> unique join code would find the UniqueKeys instead of what it does
> today and not find any unique indexes on the partitioned table.
>

Another possibility is to support partition-wise join between an
unpartitioned table and a partitioned table by joining the
unpartitioned table with each partition separately and appending the
results. That's a lot of work and quite some new infrastructure. Each
of those join will pick unique key if appropriate index exists on the
partitions.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread David Rowley

On 11 December 2017 at 22:23, Ashutosh Bapat
 wrote:
> I think this didn't work with inheritance before partition-wise join
> as well for the same reason.

Yeah, I was just demonstrating a reason that the plans are not
identical from querying a partitioned table when partition pruning
eliminates all but one partition vs directly querying that single
partition. I've been attaching some regression test changes which
demonstrate some other subtle differences along with each version of
the patch. I just wanted to be open with these subtle differences that
I've encountered while working on this.

> Probably, we could do it if unique paths (sorted paths which can be
> unique-ified) bubble up the Append hierarchy. We already have code in
> add_paths_to_append_rel() to create sorted paths when even a single
> child has an index on it.

Sometime in the future, I'd like to see some sort of UniqueKeys List
in RelOptInfo which is initially populated by using looking at the
unique indexes during build_simple_rel(). The idea is that these
UniqueKeys would get propagated into RELOPT_JOINRELs when the join
condition matches a set of unique keys on the other side of the join.
e.g. If the outer side of the join had a UniqueKey matching the join
condition then we could take all of the UniqueKeys from the RelOptInfo
for the inner side of the join (and vice versa). This infrastructure
would allow us to no-op a DISTINCT when the RelOptInfo has targetlist
items which completely cover at least one UniqueKey set, or not bother
performing a sort for a GROUP BY when the GROUP BY clause is covered
by a UniqueKey.

To fix the case we're discussing, we can also propagate those
UniqueKeys to an AppendPath with a single subpath similar to how I'm
propagating the PathKeys for the same case in this patch, that way the
unique join code would find the UniqueKeys instead of what it does
today and not find any unique indexes on the partitioned table.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread Ashutosh Bapat

On Mon, Dec 11, 2017 at 2:42 PM, David Rowley
 wrote:
> On 11 December 2017 at 21:18, Ashutosh Bapat
>  wrote:
>> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley
>>  wrote:
>>> While rebasing this today I also noticed that we won't properly detect
>>> unique joins in add_paths_to_joinrel() as we're still testing for
>>> uniqueness against the partitioned parent rather than the only child.
>>> This is likely not a huge problem since we'll always get a false
>>> negative and never a false positive, but it is a missing optimisation.
>>> I've not thought of how to solve it yet, it's perhaps not worth going
>>> to too much trouble over.
>>>
>>
> [...]
>
>> Do you have a testcase, which shows the problem?
>
> I do indeed:
>
> create table p (a int not null) partition by range (a);
> create table p1 partition of p for values from (minvalue) to (maxvalue);
> create unique index on p1 (a);
> create table t (a int not null);
>
> insert into p values(1),(2);
> insert into t values(1);
>
> analyze p;
> analyze t;
>
> explain (verbose, costs off) select * from p inner join t on p.a = t.a;
>  QUERY PLAN
> -
>  Nested Loop
>Output: p1.a, t.a
>Join Filter: (p1.a = t.a)
>->  Seq Scan on public.t
>  Output: t.a
>->  Seq Scan on public.p1
>  Output: p1.a
> (7 rows)
>
> explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a;
>  QUERY PLAN
> -
>  Nested Loop
>Output: p1.a, t.a
>Inner Unique: true
>Join Filter: (p1.a = t.a)
>->  Seq Scan on public.t
>  Output: t.a
>->  Seq Scan on public.p1
>  Output: p1.a
> (8 rows)
>
> Notice that when we join to the partitioned table directly and the
> Append is removed, we don't get the "Inner Unique: true"

Ok. I thought we don't use unique joins in partition-wise joins. Sorry
for the misunderstanding.

I think this didn't work with inheritance before partition-wise join
as well for the same reason.

Probably, we could do it if unique paths (sorted paths which can be
unique-ified) bubble up the Append hierarchy. We already have code in
add_paths_to_append_rel() to create sorted paths when even a single
child has an index on it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2017-12-11 Thread David Rowley

On 11 December 2017 at 21:18, Ashutosh Bapat
 wrote:
> On Thu, Dec 7, 2017 at 5:11 AM, David Rowley
>  wrote:
>> While rebasing this today I also noticed that we won't properly detect
>> unique joins in add_paths_to_joinrel() as we're still testing for
>> uniqueness against the partitioned parent rather than the only child.
>> This is likely not a huge problem since we'll always get a false
>> negative and never a false positive, but it is a missing optimisation.
>> I've not thought of how to solve it yet, it's perhaps not worth going
>> to too much trouble over.
>>
>
[...]

> Do you have a testcase, which shows the problem?

I do indeed:

create table p (a int not null) partition by range (a);
create table p1 partition of p for values from (minvalue) to (maxvalue);
create unique index on p1 (a);
create table t (a int not null);

insert into p values(1),(2);
insert into t values(1);

analyze p;
analyze t;

explain (verbose, costs off) select * from p inner join t on p.a = t.a;
 QUERY PLAN
-
 Nested Loop
   Output: p1.a, t.a
   Join Filter: (p1.a = t.a)
   ->  Seq Scan on public.t
 Output: t.a
   ->  Seq Scan on public.p1
 Output: p1.a
(7 rows)

explain (verbose, costs off) select * from p1 inner join t on p1.a = t.a;
 QUERY PLAN
-
 Nested Loop
   Output: p1.a, t.a
   Inner Unique: true
   Join Filter: (p1.a = t.a)
   ->  Seq Scan on public.t
 Output: t.a
   ->  Seq Scan on public.p1
 Output: p1.a
(8 rows)

Notice that when we join to the partitioned table directly and the
Append is removed, we don't get the "Inner Unique: true"

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Learned Index

2017-12-11 Thread Deepak Balasubramanyam

I came across this paper making a case for indices that use machine
learning to optimise search.

https://arxiv.org/pdf/1712.01208.pdf

The gist seems to be to use a linear regression model or feed a tensor flow
model when a more complicated distribution is needed for the data and allow
SIMD instructions working on top of GPUs / TPUs to speed up lookups. The
speedup observed is anywhere from 40-60%.

That result looks impressive but I don't have enough context on say
rebuilding a neural net on every DML operation. The equivalent operation
that I can relate to on PG would be to rebalance the B-tree for DML
operations.

In your opinion, would a ML model work for a table whose operations are
both write and read heavy? I'd love to hear your thoughts on the paper.

Thanks for reading
- Deepak

Re: Parallel Index Scan vs BTP_DELETED and BTP_HALF_DEAD

2017-12-11 Thread Thomas Munro

On Mon, Dec 11, 2017 at 8:14 PM, Amit Kapila  wrote:
> On Mon, Dec 11, 2017 at 8:21 AM, Thomas Munro
>  wrote:
>> ... and then it called _bt_parallel_seize() itself, in violation of
>> the rule (by my reading of the code) that you must call
>> _bt_parallel_release() (via _bt_readpage()) or _bt_parallel_done()
>> after seizing the scan.  If you call _bt_parallel_seize() again
>> without doing that first, you'll finish up waiting for yourself
>> forever.  Does this theory make sense?
>>
>
> Yes, I think if the current page is half-dead or deleted, we need to
> set the next page to be scanned and release the parallel scan.  This
> has to be done for both forward and backward scans.

Your patch seems to match what _bt_readpage() would do in the regular
live page path, namely _bt_parallel_release(scan, opaque->btpo_next)
to advance to the right and _bt_parallel_release(scan,
BufferGetBlockNumber(so->currPos.buf)) to advance to the left.  I
haven't tested it, but it certainly looks correct on that basis (I
admit that the asymmetry threw me for a moment but I get it now).

> Thanks for looking into it.  I will see if we can write some test.  In
> the meantime if possible, can you please request Patrick Hemmer to
> verify the attached patch?

Our discussion was on the #postgresql Freenode channel.  I pointed him
at this thread, but I'm not sure if he'll see my message or be able to
test.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] WAL logging problem in 9.4.3?

2017-12-11 Thread Kyotaro HORIGUCHI

At Tue, 28 Nov 2017 10:36:39 +0900, Michael Paquier  
wrote in 
> On Thu, Sep 14, 2017 at 3:34 PM, Kyotaro HORIGUCHI
>  wrote:
> > At Wed, 13 Sep 2017 17:42:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> >  wrote in 
> > <20170913.174239.25978735.horiguchi.kyot...@lab.ntt.co.jp>
> >> filterdiff seems to did something wrong..
> >
> > # to did...

It's horrid to see that:p

> > The patch is broken by filterdiff so I send a new patch made
> > directly by git format-patch. I confirmed that a build completes
> > with applying this.
> 
> To my surprise this patch still applies but fails recovery tests. I am
> bumping it to next CF, for what will be its 8th registration as it is
> for a bug fix, switching the status to "waiting on author".

Thank you for checking that. I saw maybe the same failure. It
occurred when visibilitymap_set() is called with heapBuf =
InvalidBuffer during recovery. Checking pendingSyncs and
no_pending_sync before the elog fixes it. Anyway the DEBUG2 elogs
are to removed before committing. They are just to look how it
works.

The attached patch applies on the current HEAD and passes all
recovery tests.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From af24850bf8ec5ea082d3affce9d0754daf1862ea Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 12 Sep 2017 13:01:33 +0900
Subject: [PATCH] Fix WAL logging problem

---
 src/backend/access/heap/heapam.c| 113 ---
 src/backend/access/heap/pruneheap.c |   3 +-
 src/backend/access/heap/rewriteheap.c   |   4 +-
 src/backend/access/heap/visibilitymap.c |   3 +-
 src/backend/access/transam/xact.c   |   7 +
 src/backend/catalog/storage.c   | 324 +---
 src/backend/commands/copy.c |  13 +-
 src/backend/commands/createas.c |   9 +-
 src/backend/commands/matview.c  |   6 +-
 src/backend/commands/tablecmds.c|   8 +-
 src/backend/commands/vacuumlazy.c   |   6 +-
 src/backend/storage/buffer/bufmgr.c |  40 +++-
 src/backend/utils/cache/relcache.c  |  13 ++
 src/include/access/heapam.h |   8 +-
 src/include/catalog/storage.h   |   5 +-
 src/include/storage/bufmgr.h|   2 +
 src/include/utils/rel.h |   8 +
 17 files changed, 482 insertions(+), 90 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3acef27..ecb9ad8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -34,6 +34,28 @@
  *	  the POSTGRES heap access method used for all POSTGRES
  *	  relations.
  *
+ * WAL CONSIDERATIONS
+ *	  All heap operations are normally WAL-logged. but there are a few
+ *	  exceptions. Temporary and unlogged relations never need to be
+ *	  WAL-logged, but we can also skip WAL-logging for a table that was
+ *	  created in the same transaction, if we don't need WAL for PITR or
+ *	  WAL archival purposes (i.e. if wal_level=minimal), and we fsync()
+ *	  the file to disk at COMMIT instead.
+ *
+ *	  The same-relation optimization is not employed automatically on all
+ *	  updates to a table that was created in the same transacton, because
+ *	  for a small number of changes, it's cheaper to just create the WAL
+ *	  records than fsyncing() the whole relation at COMMIT. It is only
+ *	  worthwhile for (presumably) large operations like COPY, CLUSTER,
+ *	  or VACUUM FULL. Use heap_register_sync() to initiate such an
+ *	  operation; it will cause any subsequent updates to the table to skip
+ *	  WAL-logging, if possible, and cause the heap to be synced to disk at
+ *	  COMMIT.
+ *
+ *	  To make that work, all modifications to heap must use
+ *	  HeapNeedsWAL() to check if WAL-logging is needed in this transaction
+ *	  for the given block.
+ *
  *-
  */
 #include "postgres.h"
@@ -56,6 +78,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/storage.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "port/atomics.h"
@@ -2373,12 +2396,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2409,6 +2426,7 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * TID where the tuple was stored.  But note that any toasting of fields
  * within the tuple data is NOT reflected into *tup.
  */
+extern HTAB *

Re: Added PostgreSQL internals learning materials in Developer FAQ

2017-12-11 Thread Craig Ringer

On 11 December 2017 at 14:50, Amit Kapila  wrote:

> On Mon, Dec 11, 2017 at 8:43 AM, Tsunakawa, Takayuki
>  wrote:
> > Hello,
> >
> > FYI
> > I collected Web resources to learn PostgreSQL internals in the Developer
> FAQ page.  I did this because I need to provide self-education materials
> for new developers.  Specifically, I modified "How is the source code
> organized?", and added "What information is available to learn PostgreSQL
> internals?".
> >
> > Developer FAQ
> > https://wiki.postgresql.org/wiki/Developer_FAQ#
> Development_Tools_and_Help
> >
> >
> > I borrowed most of the information from the following discussion.
> >
>
> Thanks, looks like quite useful information for the new people who
> want to learn about PostgreSQL especially internals.
>
>
While we're talking wiki changes, some folks may be interested in the
recent stuff I added to the profiling with perf page
https://wiki.postgresql.org/wiki/Profiling_with_perf  on using postgres's
predefined tracepoints, injecting userspace tracepoints, capturing variable
values at tracepoints, etc.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Out of date comment in cached_plan_cost

2017-12-11 Thread Ashutosh Bapat

On Mon, Dec 11, 2017 at 3:02 AM, David Rowley
 wrote:
> On 9 December 2017 at 06:05, Robert Haas  wrote:
>> On Thu, Dec 7, 2017 at 3:14 AM, David Rowley
>>  wrote:
>>> The attached is my attempt at putting this right.
>>
>> I don't feel entirely right about the way this seems to treat
>> inheritance and partitioning as two entirely separate features; that's
>> true from a user perspective, more or less, but not internally to the
>> code.
>
> Originally I had it typed out in a way that mentioned something about
> "unless using partition-wise join with partitioned tables", but I felt
> that the partition planning code is in such a state of flux at the
> moment that I feared that comment might get outdated again someday in
> the near future.
>
> I've attached another patch which just loosens the claim that join
> planning situation is never made worse by inheritance children to now
> say it does not "generally" make any worse.
>
>> Of course, this also begs the question of whether we ought to be
>> changing the formula somehow.
>
> Perhaps, but not for this patch. The comment already mentions that the
> code is far from perfect.

I don't see much difference in the old and new wording. The word
"generally" confuses more than clarifying the cases when the planning
cost curves do not change with the number of relations i.e.
partitions. I think if we add following sentence after "situation
worse." "However in case of declarative partitioning, we may plan each
child separately e.g partition-wise join aims at planning join between
matching partitions. In that case, plan cost curve significantly
changes with the number of partition relations.". After writing this I
think I understand what you meant by

> Originally I had it typed out in a way that mentioned something about
> "unless using partition-wise join with partitioned tables", but I felt
> that the partition planning code is in such a state of flux at the
> moment that I feared that comment might get outdated again someday in
> the near future.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

1 2 >

1 - 100 of 104 matches

Mail list logo